PrivSyn: Differentially Private Data Synthesis


Zhikun Zhang, Zhejiang University and CISPA Helmholtz Center for Information Security; Tianhao Wang, Ninghui Li, and Jean Honorio, Purdue University; Michael Backes, CISPA Helmholtz Center for Information Security; Shibo He and Jiming Chen, Zhejiang University and Alibaba-Zhejiang University Joint Research Institute of Frontier Technologies; Yang Zhang, CISPA Helmholtz Center for Information Security


In differential privacy (DP), a challenging problem is to generate synthetic datasets that efficiently capture the useful information in the private data. The synthetic dataset enables any task to be done without privacy concern and modification to existing algorithms. In this paper, we present PrivSyn, the first automatic synthetic data generation method that can handle general tabular datasets (with 100 attributes and domain size > 2500). PrivSyn is composed of a new method to automatically and privately identify correlations in the data, and a novel method to generate sample data from a dense graphic model. We extensively evaluate different methods on multiple datasets to demonstrate the performance of our method.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

@inproceedings {272124,
author = {Zhikun Zhang and Tianhao Wang and Ninghui Li and Jean Honorio and Michael Backes and Shibo He and Jiming Chen and Yang Zhang},
title = {{PrivSyn}: Differentially Private Data Synthesis},
booktitle = {30th USENIX Security Symposium (USENIX Security 21)},
year = {2021},
isbn = {978-1-939133-24-3},
pages = {929--946},
url = {},
publisher = {USENIX Association},
month = aug

Presentation Video