Shripad Gade, Meta Platforms, Inc.
Differentially Private Synthetic Data Generation (DP-SDG) enables privacy-compliant access to sensitive tabular data by creating artificial datasets that preserve statistical properties while introducing calibrated noise. While research often focuses on straightforward scenarios, deploying DP-SDG at scale introduces significant real-world challenges.
A major challenge is Scalability and Quality, as state-of-the-art algorithms struggle with the high-dimensional data common in industry. We introduce the GEM+ algorithm, which scales SDG to industry-sized datasets with hundreds of columns within tractable runtimes, achieving a 10% improvement in accuracy over the current state-of-the-art AIM algorithm, which is known to scale poorly.
A second challenge is managing Public-Private Input Data Splits, where only a subset of columns are considered sensitive. Industry applications often involve datasets that have a public/private columns mix. We propose a framework to adapt DP-SDG methods to this vertical data split, allowing for judicious use of the differential privacy budget. Furthermore, we introduce conditional generation for both PGM-based and Generator Neural Network-based SDG, where synthetic private data is conditioned on public data, substantially improving synthetic data quality.
Authors: Samuel Maddock, Shripad Gade, Graham Cormode, Will Bullock

Shripad Gade is a Research Scientist at Meta Platforms. His work is centered around building Privacy Enhancing Technologies and its applications, specifically focussing on synthetic data. He received his PhD at the University of Illinois Urbana–Champaign, where he developed Privacy-aware Distributed Algorithms.

author = {Shripad Gade},
title = {Generating {High-Quality} Tabular Synthetic Data at Scale},
year = {2026},
address = {Santa Clara, CA},
publisher = {USENIX Association},
month = jun
}