When Machine Learning Isn’t Private

Note: Presentation times are in Pacific Standard Time (PST).

Wednesday, February 02, 2022 - 3:20 pm3:50 pm

Nicholas Carlini, Google

Abstract: 

Current machine learning models are not private: they reveal particular details about the individual examples contained in datasets used for training. This talk studies various aspects of this privacy problem. For example, we have found that adversaries can query GPT-2 (a pretrained language model) to extract personally-identifiable information from its training set.

Preventing this leakage is difficult, and recent ad-hoc proposals are not effective. And while there exist provably-secure schemes (e.g., through differentially private gradient descent) they come at a high utility cost. We conclude with potential next steps for researchers (with problems that should be solved) and practitioners (with practical techniques to test for memorization).

Nicholas Carlini, Google

Nicholas Carlini is a research scientist at Google Brain. He studies the security and privacy of machine learning, for which he has received best paper awards at ICML, USENIX Security and IEEE S&P. He obtained his PhD from the University of California, Berkeley in 2018.
BibTeX
@conference {277341,
author = {Nicholas Carlini},
title = {When Machine Learning {Isn{\textquoteright}t} Private},
year = {2022},
address = {Santa Clara, CA},
publisher = {USENIX Association},
month = feb
}

Presentation Video