{Updates-Leak}: Data Set Inference and Reconstruction Attacks in Online Learning

Ahmed Salem; Apratim Bhattacharya; Michael Backes; Mario Fritz; Yang Zhang

Authors:

Ahmed Salem, CISPA Helmholtz Center for Information Security; Apratim Bhattacharya, Max Planck Institute for Informatics; Michael Backes, Mario Fritz, and Yang Zhang, CISPA Helmholtz Center for Information Security

Abstract:

Machine learning (ML) has progressed rapidly during the past decade and the major factor that drives such development is the unprecedented large-scale data. As data generation is a continuous process, this leads to ML model owners updating their models frequently with newly-collected data in an online learning scenario. In consequence, if an ML model is queried with the same set of data samples at two different points in time, it will provide different results.

In this paper, we investigate whether the change in the output of a black-box ML model before and after being updated can leak information of the dataset used to perform the update, namely the updating set. This constitutes a new attack surface against black-box ML models and such information leakage may compromise the intellectual property and data privacy of the ML model owner. We propose four attacks following an encoder-decoder formulation, which allows inferring diverse information of the updating set. Our new attacks are facilitated by state-of-the-art deep learning techniques. In particular, we propose a hybrid generative model (CBM-GAN) that is based on generative adversarial networks (GANs) but includes a reconstructive loss that allows reconstructing accurate samples. Our experiments show that the proposed attacks achieve strong performance.

Ahmed Salem, CISPA Helmholtz Center for Information Security

Apratim Bhattacharya, Max Planck Institute for Informatics

Michael Backes, CISPA Helmholtz Center for Information Security

Mario Fritz, CISPA Helmholtz Center for Information Security

Yang Zhang, CISPA Helmholtz Center for Information Security

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {247690,
author = {Ahmed Salem and Apratim Bhattacharya and Michael Backes and Mario Fritz and Yang Zhang},
title = {{Updates-Leak}: Data Set Inference and Reconstruction Attacks in Online Learning},
booktitle = {29th USENIX Security Symposium (USENIX Security 20)},
year = {2020},
isbn = {978-1-939133-17-5},
pages = {1291-1308},
url = {https://www.usenix.org/conference/usenixsecurity20/presentation/salem},
publisher = {USENIX Association},
month = aug
}

Download

Salem PDF

Salem Paper (Prepublication) PDF

View the slides

Updates-Leak: Data Set Inference and Reconstruction Attacks in Online Learning