Mikhail Pravilov, Google
Differentially Private (DP) synthetic data is a promising solution for enabling data-driven innovation while protecting user privacy. However, transforming cutting-edge research in DP into robust, scalable, and usable production systems presents significant engineering challenges. Our library, DPSynth, is based on state-of-the-art marginal-based mechanisms (McKenna et al., 2022), and builds upon the foundations of PipelineDP and mbi libraries.
This talk will share our experience in building and applying DPSynth in production settings, highlighting the journey of productionalizing these research concepts. We'll discuss how DPSynth is built to scale for massive datasets using technologies like Apache Beam and Apache Spark. We will also cover key engineering aspects such as handling real-world data constraints to ensure synthetic data utility and validity, and designing for usability with reasonable defaults for non-DP experts. The library is slated for open-source release prior to the conference, aiming to foster wider adoption of practical DP synthetic data techniques.
Authors: Ryan McKenna, Peter Kairouz, Alexander Knop, Vadym Doroshenko, Eva Bertels

Mikhail Pravilov is a Software Engineer on Google's Anonymization team, developing practical Differential Privacy solutions at scale. A main contributor to the open-source Jax Privacy and PipelineDP4j libraries, he also works on numerous internal anonymization projects. Holding a bachelor's degree in Machine Learning, Mikhail is dedicated to advancing real-world data privacy.

author = {Mikhail Pravilov},
title = {{DPSynth}: From Research to {Production{\textemdash}Engineering} Differentially Private Synthetic Tabular Data at Scale},
year = {2026},
address = {Santa Clara, CA},
publisher = {USENIX Association},
month = jun
}