Publishing Wikipedia Project Usage Data with Strong Privacy Protections and without Tracking

Thursday, June 23, 2022 - 9:35 am10:00 am

Hal Triedman, Privacy Engineer, Wikimedia Foundation


The Wikimedia Foundation places a premium on protecting reader and editor privacy while being maximally transparent and releasing data to support users, developers, and researchers. Datasets containing sensitive information historically have been kept private. Differential privacy offers a promising approach to safely releasing sensitive datasets, which would be valuable for e.g., editors looking to identify important articles to their language and region. This talk will discuss differentially private approaches to releasing data, the technical and social challenges that we have faced so far (Wikimedia doesn’t track users, thus enforcing a maximum number of data points per person is at best approximate), and proposed solutions that avoid tracking users while maintaining privacy guarantees. We want to share findings with the rest of the field and enable differential privacy without tracking cookies.

Hal Triedman, Wikimedia Foundation

Hal is a privacy engineer with the Wikimedia Foundation, implementing and researching issues related to privacy, transparency, and algorithmic fairness. He is interested in what institutional accountability should look like in a world of open, differentially private datasets, and hopes to put tools for differential privacy in the hands of more developers and analysts.

@conference {280306,
author = {Hal Triedman},
title = {Publishing Wikipedia Project Usage Data with Strong Privacy Protections and without Tracking},
year = {2022},
address = {Santa Clara, CA},
publisher = {USENIX Association},
month = jun

Presentation Video