PEPR '26 Conference Program

All sessions will be held in the Santa Clara Ballroom unless otherwise noted.

Attendee Files 
PEPR '26 Attendee List (PDF)

Monday, June 1

7:30 am–8:55 am

Continental Breakfast

Mezzanine East/West

8:55 am–9:10 am

Opening Remarks

Program Co-Chairs: Hana Habib, Carnegie Mellon University, and Sha Sundaram, Apple

9:10 am–10:35 am

Privacy & AI

Room Captain: Nuria Ruiz, Netflix

Envisioning and Mitigating Privacy Risks for Consumer-Facing AI Product Concepts through Human-AI Teaming

Monday, 9:10 am9:30 am

Hao-Ping (Hank) Lee, Carnegie Mellon University

AI creates and exacerbates privacy risks, yet product teams often lack the expertise to spot and mitigate issues early—leaving privacy experts to translate principles and correct late-stage choices. What if teams could draft a solid privacy "first draft" before involving experts? We present Privy, a human-AI teaming tool powered by generative AI (GenAI) that enhances non-privacy-expert practitioners' privacy awareness during AI product ideation. Privy helps teams surface likely privacy risks and propose concrete mitigations, producing high-quality intake artifacts so experts can focus on product-specific, high-impact decisions. We grounded Privy's design in a formative study with 11 practitioners and evaluated it with 24 additional practitioners; 13 independent privacy experts rated the resulting privacy assessments high quality, with relevant risks and appropriate mitigations. Practitioners found Privy useful and usable, reporting improved awareness, motivation, and ability in doing privacy work. We conclude with design roles for integrating GenAI into privacy workflows.

Hao-Ping (Hank) Lee is a fourth-year PhD student at the Human-Computer Interaction Institute at Carnegie Mellon University, advised by Professors Sauvik Das and Jodi Forlizzi. His research sits at the intersection of usable privacy security, human-computer interaction, and human-centered AI. He studies and builds tools that enable practitioners to identify, reason about, and mitigate AI-entailed privacy risks during the development of consumer AI products. His research has received Best Paper and Distinguished Paper Awards at top HCI and privacy and security conferences, including CHI and USENIX. He has also been recognized with the CMU CyLab Presidential Fellowship.

From Legalese to Logic: Translating Privacy Requirements into Verifiable Controls via Agentic Workflows

Monday, 9:30 am9:50 am

Rituraj Kirti and Inchara Shivalingaiah, Meta

Translating textual privacy requirements, especially purpose limitation, into enforceable controls across large-scale data ecosystems is slow, error-prone, and difficult to validate. We present an experience report on deploying an LLM-powered guided privacy engineering workflow that helps engineers: (1) decompose requirement text into implementable "privacy jobs to be done," (2) traverse data lineage to identify impacted datasets and flows, (3) suggest candidate enforcement and monitoring points, and (4) track state from interpretation → implementation → verification.

We will walk through a real requirement end-to-end, highlight where automation helped vs. where human judgment remained essential, and share practical lessons on system architecture, governance, evaluation, and failure modes (including misinterpretations and lineage gaps). Attendees will leave with a reusable workflow template and design guidelines for building similar systems in their organizations.

Rituraj Kirti is a Software Engineer at Meta who builds reusable patterns a.k.a 'recipes' for implementing end to end privacy controls at scale. He previously worked on applied ML products at Meta and holds a B.E. (Hons.) in Instrumentation Engineering from BITS Pilani.

Inchara Shivalingaiah is a Software Engineer on Meta's Privacy Infrastructure experiences team, where she builds developer workflows and platform capabilities to translate privacy requirements into verifiable technical controls. She holds an M.S. in Computer Science and Engineering from UCLA.

A DP Framework for Gaining Insights into AI Chatbot Use

Monday, 9:50 am10:10 am

Mikhail Pravilov, Google

We introduce a novel framework for generating insights about LLM chatbot interactions with rigorous differential privacy (DP) guarantees. The framework employs a private clustering mechanism and DP keyword extraction methods. By leveraging DP tools such as clustering, partition selection, and histogram-based summarization, Urania provides end-to-end privacy protection. Our evaluation assesses quality of the output benchmarked against a non-private Clio-inspired pipeline (Tamkin et al., 2024). The results show the framework's ability to extract meaningful conversational insights while maintaining stringent user privacy, effectively balancing data utility with privacy preservation. Finally, we discuss the practical applications, limitations, and operational challenges of applying Urania in real-world settings.

Authors: Edith Cohen, Vadym Doroshenko, Badih Ghazi, Charlie Harrison, Peter Kairouz, Pritish Kamath, Alexander Knop, Ravi Kumar, Ethan Leeman, Daogao Liu, Pasin Manurangsi, Adam Sealfon, Da Yu, and Chiyuan Zhang

Mikhail Pravilov is a Software Engineer on Google's Anonymization team, developing practical Differential Privacy solutions at scale. A main contributor to the open-source Jax Privacy and PipelineDP4j libraries, he also works on numerous internal anonymization projects. Holding a bachelor's degree in Machine Learning, Mikhail is dedicated to advancing real-world data privacy.

CA-CI: A Normative Framework for Evaluating Privacy and Dignity in AI Governance

Monday, 10:10 am10:30 am

Florian Schaub, University of Michigan

Available Media

This talk introduces Capabilities Approach-Contextual Integrity (CA-CI), a normative framework that extends Contextual Integrity's theory of privacy to meet contemporary AI governance challenges by (1) making purpose an explicit governance parameter and (2) specifying minimum thresholds for dignity as a second normative standard, drawn from philosopher Martha Nussbaum's Capabilities Approach (CA). Drawing on the speaker's research on AI privacy perceptions and privacy theory development, the talk will apply CA-CI to AI scenarios and show how privacy practitioners can incorporate its criteria into impact assessments, risk registers, and approval workflows, even as AI capabilities, regulations, and social norms evolve.

Dr. Florian Schaub is associate professor of Information and of Electrical Engineering and Computer Science at the University of Michigan. His interdisciplinary research combines privacy, human-computer interaction, emerging technologies, and public policy. He studies people's privacy decision making and behavior, investigates technology-related privacy implications, and develops human-centric privacy solutions. Dr. Schaub holds a PhD in Computer Science from the University of Ulm, and was a postdoctoral fellow in Carnegie Mellon University's School of Computer Science. His research has been honored with the 2019 Caspar Bowden Award for Outstanding Research in Privacy Enhancing Technologies, and with best paper awards at leading cybersecurity and human-computer interaction conferences, including at the ACM SIGCHI Conference on Human Factors in Computing (CHI), the USENIX Security Symposium, the ACM Conference on Computer and Communications Security (CCS), and the Symposium on Usable Privacy and Security (SOUPS). Dr. Schaub is an ACM Distinguished Member and a DARPA Young Faculty Award recipient. His research has directly impacted industry practice and public policy, including the rulemaking process for the California Consumer Privacy Act (CCPA).

Vision: Human-as-the-Unit Privacy Management with AI Agents

Monday, 10:30 am10:35 am

Eryue Xu, Northeastern University and University of Illinois Urbana–Champaign

Available Media

Managing one's digital footprint is overwhelming, as it spans multiple platforms and involves countless context-dependent decisions. This talk presents research from the CHI 2026 paper by Eryue Xu and Tianshi Li, which explores how emerging agentic AI systems might support more comprehensive privacy management. We adopted a "human-as-the-unit" perspective and investigated users' cross-context privacy challenges through semi-structured interviews. Results reveal that people rely on ad hoc manual strategies while lacking comprehensive privacy controls, highlighting nine privacy-management challenges across applications, temporal contexts, and relationships. To explore solutions, we generated nine AI agent concepts and evaluated them via a speed-dating survey with 116 US participants. The highest-ranked designs focused on post-sharing privacy management, where AI agents help users detect, review, and remediate previously shared information. Participants expressed strong interest in automated assistance and often reported greater confidence in AI-supported privacy management than in their own manual efforts. Our findings highlight a promising design space where users see AI agents bridging the fragments in privacy management.

Eryue's research explores how people delegate agency and negotiate privacy boundaries with intelligence systems. Her work, which includes publications at CHI and USENIX Security, bridges human cognition, UX research, and computational methods to design AI systems that respect human intent and trust. Eryue is now working on her PhD at UIUC, previously earned her BS in cognitive science from UCSD, and MS in human-computer interaction from Georgia Tech.

10:35 am–11:05 am

Coffee and Tea Break

Mezzanine East/West

11:05 am–12:25 pm

Policy & Compliance

Room Captain: Florian Schaub, University of Michigan

Enforcement of Data Protection Laws in Africa: Implications for Privacy Engineers

Monday, 11:05 am11:25 am

Mubarak Raji, University of Illinois Urbana–Champaign

As Africa experiences a surge in digital advancement, with internet users projected to reach about 1.1 billion in 2029, it has also seen an increase in data protection laws (DPLs), with about 44 of 55 African Union countries have DPLs as of December 2025. Enforcement of these laws has also begun (e.g., Nigeria fined Meta USD220 million). However, African enforcement mechanisms are rarely considered by privacy engineers. In this presentation, I will share our study results from a thematic content analysis that examined the enforcement mechanisms of 20 African DPLs reveal unique enforcement patterns, such as extraterritorial reach, mandatory compliance audits, administrative and criminal sanctions, and financial and civil liabilities. Our findings are essential for all privacy professionals, particularly privacy engineers whose technical work operationalizes privacy, as they traverse the intricacies of compliance in Africa's evolving landscape. Understanding African enforcement approaches provides the PEPR audience with a vital perspective on global data protection efforts and strengthens their capability to address compliance challenges across diverse regulatory environments.

Mubarak is a Nigerian-licensed attorney and a PhD student at the University of Illinois Urbana–Champaign School of Information Sciences, specializing in data privacy, AI governance, and ethics in the Global Majority, with a focus on Africa's unique challenges and opportunities. His research also covers the governance of Agentic AI and AI agents, particularly as relates to privacy. Earlier in 2026, Mubarak received the Westin Scholar Award from the International Association of Privacy Professionals for excellence in privacy. He earned his Master of Laws from Illinois College of Law, a BL from Nigerian Law School, and a Bachelor of Laws from Bayero University. Before graduate studies, he was an associate at Paul Usoro & Co. in Lagos, advising on data protection issues and defending privacy lawsuits involving millions, as well as representing multinational companies like ExxonMobil and PZ Cussons in high-stakes commercial disputes.

Dismantling the Barriers to Personal Data Portability

Monday, 11:25 am11:45 am

Arthur Borem, University of Chicago; Lisa Dusseault, Data Transfer Initiative

Data Portability is an under-rated right, but is needed for people to have agency over their data privacy – a fact recognized by emerging regulations (EU, UK). Companies hosting personal data such as photos, playlists, conversations, search history, favorites and forum messages, must satisfy these regulations.

Despite the increasing regulatory pressure, significant barriers to data portability remain, including usability barriers, interoperability problems, and lack of trust between the services that need to work together. We explain the biggest barriers and provide some recommendations by presenting relevant findings from academic research, as well as prototypes that show that best UX practice is feasible for any participant. We point to solutions for the problem of trust coordination failures, which has contributed to the low number of effective options for personal data transfer. This talk is especially relevant for privacy-conscious companies holding personal data.

Arthur Borem is a final-year PhD candidate in Computer Science at the University of Chicago. His research focuses on building usable privacy systems to empower people online with the agency and knowledge to control their data and how it is used. He has led development of privacy tools targeting end-users (e.g., Data Subject Access Request interactive interfaces) and companies/organizations (e.g., Python data portability library). As a software engineer, Arthur has built and shipped features and tools at the Data Transfer Initiative, Asana, and Lyft.

Lisa Dusseault is the CTO of the non-profit Data Transfer Initiative, supporting consumer data portability across tech platforms. With a dual career in standards and startups, she brings both idealism and pragmatism. On the startup side, Lisa was CTO of Compaas and ShareTheVisit and VPEng of Klutch. On the standards side, she co-authored CalDAV, updated WebDAV, was chair of the XMPP and IMAPExt working groups, and spent four years as Area Director shepherding new Applications area work at the IETF.

Why Consent Fails in Practice: Lessons from Web Measurement Studies

Monday, 11:45 am12:05 pm

Muhammad Abu Bakar Aziz and Christo Wilson, Northeastern University

Modern websites rely on a complex technical pipeline to obtain user consent: users interact with consent banners; consent management platforms (CMPs) encode their choices; scripts expose those consent signals via cookies, APIs, and IAB standards; and advertising systems are expected to read those consent signals and then honor them. In theory, this consent pipeline provides users with meaningful control over their data. In practice, it frequently fails.

In this talk, we present findings from multiple empirical privacy measurement studies—including our own large-scale analysis of opt-out enforcement—to identify common technical and design failure modes that lead to breakdowns in consent transmission. We show how user consent can be lost due to low adoption or misconfiguration of privacy standards and gaps between standardized frameworks and their deployments. We conclude with high-level suggestions to improve the reliability of consent mechanisms in practice.

Muhammad Abu Bakar Aziz is a final-year PhD candidate at the Khoury College of Computer Sciences at Northeastern University. His research focuses on web and data privacy. He builds large-scale web measurement systems to study privacy practices across online platforms and advertising ecosystems. His research has appeared at top-tier conferences such as PETS and WWW. He aims to apply his privacy research and engineering experience to build and improve privacy systems in industry.

Christo Wilson is a Professor in the Khoury College of Computer Sciences at Northeastern University. Professor Wilson's research covers a wide range of digital consumer protection topics using measurement-driven approaches. This includes areas like online tracking and privacy, deceptive dark pattern user interfaces, fairness and bias in machine learning, and empirical antitrust investigations of tech platforms.

User (Non-)Compliance with Age Verification: Evidence from a Deceptive Web Experiment

Monday, 12:05 pm12:25 pm

Lorrie Faith Cranor, Carnegie Mellon University

Available Media

Twenty-five U.S. states have laws requiring some websites to perform "strong" age verification to ensure that visitors to sites containing "material harmful to minors" are over-age – and more states are considering similar laws. Under these laws, self-attesting one's age by checking a box is insufficient. Users must verify their age by using IDs, AI facial analysis, or other "commercially reasonable" options. However, users may find these approaches to age verification privacy-invasive, insecure, or inconvenient, and some users may even turn away from a website entirely if prompted with one of these methods. Our team at Carnegie Mellon University ran a 1,635-participant experiment to find out what users do when they encounter various age verification options and followed up with a survey to probe their reasoning. We'll talk about our study methods, our findings, and what policy makers and organizations that are required to age verify can learn from our results.

Lorrie Faith Cranor is Director and Bosch Distinguished Professor of the CyLab Security and Privacy Institute and FORE Systems University Professor of Computer Science and of Engineering and Public Policy at Carnegie Mellon University. She directs the CyLab Usable Privacy and Security Laboratory (CUPS) and co-directs the Privacy Engineering program. In 2016 she served as Chief Technologist at the US Federal Trade Commission. She co-founded Wombat Security Technologies. She is a fellow of the ACM, IEEE, and AAAS; a member of the ACM CHI Academy; and the author of a children's book about privacy.

12:25 pm–1:55 pm

Lunch

Terra Courtyard

1:55 pm–3:20 pm

Privacy Risk

Room Captains: Lawrence You, and Emily Choi-Greene, Clearly AI

Provenance Without Surveillance: Privacy Engineering for AI Content Transparency

Monday, 1:55 pm2:15 pm

Sai Prashanth Chandramouli, Sankalp Jain, and Gayathri Ravi, Meta Platforms, Inc.

As regulators increasingly require disclosure of AI-generated and AI-altered content (e.g., the EU AI Act and California SB 942), the industry is converging on provenance standards such as C2PA. But provenance is inherently dual-use: the same metadata that improves transparency and accountability can also expose identity, device, workflow, or other linkable signals, creating new privacy and security risks for users, advertisers, and creators. Critically, not all AI labeling carries equal privacy risk: fully synthetic content with no user input poses different challenges than AI-assisted edits to user-uploaded or camera-captured media. This talk reframes "AI labeling" as a privacy engineering problem: how do we design end-to-end provenance pipelines that satisfy transparency obligations while minimizing personally identifiable information, preventing cross-context linkage, and preserving product usability? We present a practical framework for risk assessment and controls, discussing data minimization, selective disclosure, threat modeling, retention and access policies, and UI/UX choices, and walk through a realistic deployment scenario illustrating trade-offs across the spectrum of AI assistance. Attendees will leave with actionable guidance for building compliant, privacy-preserving transparency systems.

Sai Chandramouli is a Staff Engineer at Meta Platforms, Incorporated, working on AI Privacy and Transparency. Previously, he led the launch of WAIST 3.0 at Meta, an ML powered approach to Ads Transparency. Before joining Meta, he worked at Amazon.com as a Software Engineer at Alexa and Regulatory Compliance.

Sankalp Jain leads ML explainability products at Meta Platforms, Incorporated, and is focused on increased Ads transparency for users. Before joining Meta, he was a Product Manager at Microsoft working on Dynamics 365. His experience spans enterprise communications, customer service applications, and developer productivity tools.

Gayathri Ravi is a Software Engineer at Meta, contributing to scalable platform infrastructure that ensures reliable and performant services. Before joining Meta, she spent over six years at Cisco as a key contributor to major routing platforms, including the Cisco 8000 series and Cisco Silicon One Q100 ASIC.

Turning Privacy Risk Assessment Into 20 Questions for Developers

Monday, 2:15 pm2:35 pm

Qiyu Li, University of California, San Diego

Available Media

Privacy design is often treated as inherently open-ended and deeply contextual. Small shifts in context—who requests data, for what purpose, under what stakes, and with what downstream sharing—can flip whether a data practice feels appropriate. While this captures a real phenomenon, it has also left us with privacy frameworks that are too abstract and generic to operationalize for developers without specialized privacy expertise.

In this talk, we explore methods to significantly lower the barrier of privacy risk assessment for developers. We argue that privacy design is more structured than it appears: many recurring privacy-relevant decisions can be captured using a structured, closed-ended representation. By constructing such a representation, we can transform a significant portion of open-ended privacy design into a closed-ended task.

We operationalize this through PrivacyAkinator, an interactive tool that guides developers through key privacy design decisions via LLM-generated multiple-choice questions. By grounding privacy assessment in a structured representation of the privacy design space, PrivacyAkinator transforms privacy design from ad-hoc decision-making into systematic, auditable design support.

Authors: Qiyu Li, Haojian Jin

Qiyu Li is a Ph.D. student at UC San Diego, Halıcıoğlu Data Science Institute, advised by Prof. Haojian Jin. His research lies in the intersection of Privacy & Security and Human-Computer Interaction (HCI), with a focus on harnessing AI to build usable tools that help developers create effective privacy designs.

Panel: The AI Architecture Debt—Refactoring Infrastructure for Sustainable Privacy

Monday, 2:35 pm3:20 pm

Moderator: Dylan Gilbert, IAPP; Panelists: Sri Pravallika Maddipati, Google; Nathalie Baracaldo, IBM Research; Gary Young, Google

As AI systems mature from experimental prototypes into long-lived production infrastructure, many organizations are discovering a new form of technical debt: AI architecture debt. This debt accumulates when privacy, governance, and data-minimization principles are "bolted onto" AI systems after deployment rather than "baked into" their foundations. The result is fragile compliance, opaque data provenance, and costly retrofits whenever regulations, models, or data flows change. This talk proposes a refactoring mindset for AI infrastructure and examines the AI architecture debt through a privacy engineering lens, focusing on how design decisions around data pipelines, model lifecycles, observability, and deployment patterns can either compound or reduce long-term privacy risk. We will explore practical strategies for re-architecting AI platforms to support sustainable privacy outcomes, including modular data boundaries, privacy-aware model interfaces, and infrastructure patterns that make privacy guarantees resilient to future change.

As Senior Fellow for Privacy Engineering, Dylan Gilbert expands and elevates the IAPP’s engagement with the technical design and implementation of governance across digital systems, spanning privacy, Artificial Intelligence, and digital responsibility. Dylan joined the IAPP from the U.S. National Institute of Standards and Technology, where he led the Privacy Engineering Program. At NIST, he advanced the development of privacy risk management processes, led the program’s engagement in international standards bodies, and oversaw the Privacy-Enhancing Technologies Testbed. Prior to NIST, Dylan led privacy advocacy at Public Knowledge as Policy Counsel, where he also supported work on copyright, cybersecurity, and other tech policy issues. He holds a J.D. from The George Washington University Law School and a B.A. from The College of William and Mary.

Sri Pravallika is a Privacy Engineer within Google's Search Privacy team, where she focuses on the intersection of large-scale AI development and data protection. Dedicated to keeping billions of users safe, Sri specializes in operationalizing "privacy-by-design" and building robust privacy threat models for complex AI user journeys. Before joining Google, Sri was the first Privacy Engineer at Autodesk, where she built the company's privacy assessments and metrics program from the ground up. Her career began at PwC, helping Fortune 500 companies navigate Security GRC programs and maturity assessments. Sri holds a Master's degree in Cybersecurity from Northeastern University and a Bachelor's in Computer Science.

Nathalie Baracaldo is a Senior Research Scientist and Master Inventor at IBM Research in San Jose, California. Her research currently focuses on safeguarding generative AI models through a variety of techniques, including unlearning and alignment. She has extensive experience delivering impactful machine learning solutions that are highly accurate, withstand adversarial attacks, and protect data privacy. She served as the primary investigator for the DARPA GARD program, where her focus was to ensure her team extended and maintained the Adversarial Robustness Toolbox (ART) to support red teaming evaluations. She also led the IBM federated learning effort and co-edited the book "Federated Learning: A Comprehensive Overview of Methods and Applications" Springer, 2022. In 2020 and 2021, she received the IBM Master Inventor distinction and the Corporate Technical Recognition, respectively. Her research has been published in top conferences in the fields of AI and Security, and has received multiple best paper awards and numerous citations. She received her doctorate degree from the University of Pittsburgh.

Gary Young is a Distributed Systems Engineer who has focused on Privacy, Security, and Compliance at Google since 2007. Focus areas have included access control, purpose limitations, identity, consent, sharing, data deletion, law enforcement access, onward transfer, software build of materials, and underpinning corporate governance. Most recently, applying AI to go beyond detection of problems to transparent coding automation for fixing issues in product implementation.

3:20 pm–3:50 pm

Coffee and Tea Break

Mezzanine East/West

3:50 pm–5:10 pm

Privacy Infrastructure

Room Captain: Amit Datta, Amazon

How Canva Built Simple, Auditable, and Maintainable Data Retention

Monday, 3:50 pm4:10 pm

Dr. Phillip Ward, Canva

Available Media

When data retention fails, companies face regulatory scrutiny and customer distrust. At Canva's scale, manual compliance audits are not viable and the engineering effort required for building and ongoing maintenance are unsustainable. When we refreshed our internal data policies, we saw an opportunity for change. We re-architected retention from the ground up to coordinate and evidence deleting data.

We delivered a new retention framework that required few lines of code per service, and scales across the thousands of storage instances at Canva. The new design prioritises direct measurement of compliance, and minimal engineering effort. The approach decoupled policy from implementation, and isolated domain-specific data knowledge to individual services. Despite it's simplicity, it supports daily audits, logging, chaining, and zero-code policy changes.

In this talk, I will discuss the design choices that underpin this novel approach to retention.

This work includes fantastic contributions from Xushen Ma, Jessica Hu, Saeed Attar, and many other colleagues at Canva.

Dr. Phillip Ward is the Lead of Privacy Engineering at Canva, specializing in privacy technology with over a decade of experience in software engineering, data science, and computer science. He leads a team focused on creating privacy-enabling infrastructure for the rapidly growing tech company. His team develops innovative solutions for automating risk and audit functions, simplifying compliance for employees, and empowering customers to control their data and experiences. Dr Ward has diverse experience across hydrological modeling, neuroscience, financial services, physiology, data science and software engineering, and he holds university degrees in maths, physics, computer science, and biomedical imaging.

Architecting Scalable Data Lineage Graph for Privacy Compliance and Agentic Analysis

Monday, 4:10 pm4:30 pm

Maharshi Jha and Aygun Aydin, Meta

Privacy compliance demands granular, real-time tracking of data flows at scale. This talk presents the architecture behind Meta's in-memory lineage graph, processing billions of edges across web, warehouse, and AI systems. We cover compressed graph storage, efficient traversal algorithms, and cross-platform data flow mapping. Beyond compliance, we explore how the same architecture enables agentic analysis through interactive graph traversals. The presentation shares practical solutions and architectural lessons from operating at billion-edge scale daily.

Maharshi Jha is a Software Engineer on Meta's Graph Observability team, specializing in privacy-aware data lineage. He architects in-memory lineage graphs processing billions of edges across AI, warehouse, and web systems, enabling privacy compliance and agentic analysis at scale.

Aygun Aydin is an Engineering Manager in Privacy Infrastructure at Meta, specializing in asset understanding and data lineage. He has helped build Meta's data understanding and data flow evaluation frameworks. He holds a masters degree in Software Engineering and brings deep experience in large-scale systems and high-performing teams.

Responding to Regulations Confidently and Efficiently

Monday, 4:30 pm4:50 pm

Marek Matejka, Meta

Regulatory requirements, particularly those mandating user setting changes, frequently pose a significant challenge due to their volume and potential impact, if implemented incorrectly. This talk presents a tried-and-tested process and engineering framework developed at Meta to respond to these regulations both confidently and efficiently. We detail the multi-step approach, beginning with critical cross-functional (XFN) alignment to define scope and setting values. We then provide an overview of the core framework, which drastically reduces implementation time—achieving code-completeness for setting-related requirements within a single working day. Finally, we discuss the complete lifecycle, emphasizing mechanisms for a confident launch, and, crucially, ongoing compliance. We explore the use of the Swiss Cheese Model for high assurance, robust monitoring, and solutions for complex edge cases such as conflicting regulations or regulatory overlap. This framework is a blueprint for scaling effective and trustworthy regulatory responses that others can adopt and build on top of.

Marek is a staff product software engineer at Meta. Having started at Meta almost 10 years ago, firstly on Ads followed by News, he then worked on Facebook Privacy for over 3 years to help establish the Privacy Experiences team and to respond to numerous regulations across the world. During this time he developed deep context and understanding of the Privacy regulation space, which enabled him to come up with holistic solutions to common problems.

Designing for Civic Trust: An Infrastructure to Help Long Beach Residents Manage Their CCPA Rights

Monday, 4:50 pm5:10 pm

Omar Moncayo, City of Long Beach; Norman Sadeh, Carnegie Mellon University; Gwen Shaffer, California State University, Long Beach

Available Media

Cities increasingly rely on digital technologies to manage transportation, utilities, public services, and other urban functions. These technologies, operated by both municipal agencies and private vendors, collect and process data about people in many everyday contexts. Despite existing privacy regulations, residents often lack practical ways to understand these data practices or exercise applicable rights, and cities face challenges translating policy into scalable, operational solutions.

This presentation describes ongoing work to deploy a Smart City Privacy Infrastructure in partnership with the City of Long Beach. The effort builds on California’s consumer privacy framework and leverages CMU’s IoT Privacy Infrastructure to support greater transparency and accountability across a broad ecosystem of smart city technologies, including mechanisms that allow people to rely on authorized agents, as envisioned under CCPA, to help manage privacy interactions at scale. We discuss the motivation for this work, the architectural approach used to support heterogeneous systems, and the practical challenges encountered when working with individual city departments and external vendors. The talk highlights progress to date, including onboarding technologies spanning transportation, mobility services, utilities, and other urban deployments, as well as the launch of a city-facing assistant application to help people discover and understand data collection in their environment. We conclude by discussing lessons learned, remaining challenges, and opportunities for broader adoption.

Omar Moncayo is a Data Privacy Analyst with the City of Long Beach, where he works at the intersection of privacy, technology policy, and public sector innovation. His work focuses on advancing responsible data practices within municipal systems, supporting compliance with evolving privacy regulations, and embedding privacy-by-design principles into city operations and digital services. Moncayo has played a key role in community-facing initiatives that promote data literacy, transparency, and public engagement around emerging technologies, including artificial intelligence. He collaborates closely with academic and civic partners to translate privacy and governance concepts into practical tools and processes that serve residents and city stakeholders.

Norman Sadeh is a Professor in the School of Computer Science at Carnegie Mellon University, where he co-founded and co-directs CMU’s Privacy Engineering Program and previously co- founded and co-directed the PhD Program in Societal Computing for a decade. He has served as lead principal investigator on two of the largest U.S. research initiatives in privacy: the Usable Privacy Policy Project and the Personalized Privacy Assistant Project. He was also founding CEO and, until its acquisition by Proofpoint, chairman and chief scientist of Wombat Security Technologies, a company that defined the multi-billion dollar user-oriented cybersecurity market. Technologies Norman developed with colleagues at CMU and Wombat are used to protect tens of millions of users around the world against cybersecurity attacks such as phishing. Earlier in his career, he also served at the European Commission as Chief Scientist of the 550M Euro eWork and eCommerce initiative, which included all pan-European research in cybersecurity and privacy and related policy activities.

Gwen Shaffer is a Professor in the Department of Journalism and Public Relations at California State University, Long Beach, and serves as Director of Research for the College of Liberal Arts. Her research focuses on telecommunications policy, data privacy, and the social and regulatory implications of smart city technologies. Shaffer has led and collaborated on interdisciplinary, externally funded projects examining privacy, digital governance, and community-centered technology design, including work with municipal partners to develop tools that help residents better understand and manage data collection practices. She previously served on the City of Long Beach Technology and Innovation Commission, including as chair, contributing to local policy development around digital inclusion, surveillance technologies, and responsible data use.

5:30 pm–7:00 pm

Conference Reception

Terra Courtyard

Tuesday, June 2

8:00 am–9:00 am

Continental Breakfast

Mezzanine East/West

9:00 am–10:20 am

DP & Synthetic Data

Room Captain: Tariq Yusuf

Toward Provably Private Insights into AI Use

Tuesday, 9:00 am9:20 am

Rakshita Tandon, Google

Available Media

Understanding real-world usage is critical for improving Generative AI, yet traditional analytics often risk exposing sensitive input data. This talk outlines Provably Private Insights (PPI), a novel framework that enables developers to gain deep analytical utility without compromising user privacy. PPI bridges the gap between raw data and actionable insights by integrating Trusted Execution Environments (TEEs) for external transparency and verifiability, "Data Expert" LLMs for interpreting unstructured data within secure enclaves, and Differential Privacy (DP) for mathematically-guaranteed anonymity in aggregation. The talk describes the open-sourced system architecture, and its real-world application in the Recorder app. This framework illustrates the shift beyond classic data analytics toward a "provably private" standard where server-side processing is transparent, verifiable, and restricted to privacy-preserving computations.

Authors: Albert Cheu, Artem Lagzdin, Brett McLarnon, Daniel Ramage, Katharine Daly, Marco Gruteser, Peter Kairouz, Rakshita Tandon, Stanislav Chiknavaryan, Timon Van Overveldt, Zoe Gong

I'm a Software Engineer at Google Research, specializing in the development of privacy-preserving technologies. My current work focuses on building large scale systems to enable provably private machine learning and data science on decentralized data by leveraging externally verifiable Trusted Execution Environments (TEEs). Previously, I graduated from the University of Pennsylvania (UPenn) with a Master's in Computer and Information Sciences and a B.Tech from the University of Delhi.

DPSynth: From Research to Production—Engineering Differentially Private Synthetic Tabular Data at Scale

Tuesday, 9:20 am9:40 am

Mikhail Pravilov, Google

Differentially Private (DP) synthetic data is a promising solution for enabling data-driven innovation while protecting user privacy. However, transforming cutting-edge research in DP into robust, scalable, and usable production systems presents significant engineering challenges. Our library, DPSynth, is based on state-of-the-art marginal-based mechanisms (McKenna et al., 2022), and builds upon the foundations of PipelineDP and mbi libraries.

This talk will share our experience in building and applying DPSynth in production settings, highlighting the journey of productionalizing these research concepts. We'll discuss how DPSynth is built to scale for massive datasets using technologies like Apache Beam and Apache Spark. We will also cover key engineering aspects such as handling real-world data constraints to ensure synthetic data utility and validity, and designing for usability with reasonable defaults for non-DP experts. The library is slated for open-source release prior to the conference, aiming to foster wider adoption of practical DP synthetic data techniques.

Authors: Ryan McKenna, Peter Kairouz, Alexander Knop, Vadym Doroshenko, Eva Bertels

Mikhail Pravilov is a Software Engineer on Google's Anonymization team, developing practical Differential Privacy solutions at scale. A main contributor to the open-source Jax Privacy and PipelineDP4j libraries, he also works on numerous internal anonymization projects. Holding a bachelor's degree in Machine Learning, Mikhail is dedicated to advancing real-world data privacy.

Generating High-Quality Tabular Synthetic Data at Scale

Tuesday, 9:40 am10:00 am

Shripad Gade, Meta Platforms, Inc.

Differentially Private Synthetic Data Generation (DP-SDG) enables privacy-compliant access to sensitive tabular data by creating artificial datasets that preserve statistical properties while introducing calibrated noise. While research often focuses on straightforward scenarios, deploying DP-SDG at scale introduces significant real-world challenges.

A major challenge is Scalability and Quality, as state-of-the-art algorithms struggle with the high-dimensional data common in industry. We introduce the GEM+ algorithm, which scales SDG to industry-sized datasets with hundreds of columns within tractable runtimes, achieving a 10% improvement in accuracy over the current state-of-the-art AIM algorithm, which is known to scale poorly.

A second challenge is managing Public-Private Input Data Splits, where only a subset of columns are considered sensitive. Industry applications often involve datasets that have a public/private columns mix. We propose a framework to adapt DP-SDG methods to this vertical data split, allowing for judicious use of the differential privacy budget. Furthermore, we introduce conditional generation for both PGM-based and Generator Neural Network-based SDG, where synthetic private data is conditioned on public data, substantially improving synthetic data quality.

Authors: Samuel Maddock, Shripad Gade, Graham Cormode, Will Bullock

Shripad Gade is a Research Scientist at Meta Platforms. His work is centered around building Privacy Enhancing Technologies and its applications, specifically focussing on synthetic data. He received his PhD at the University of Illinois Urbana–Champaign, where he developed Privacy-aware Distributed Algorithms.

Profile-Then-Simulate: Can LLMs Faithfully Generate Differentially Private Synthetic Data?

Tuesday, 10:00 am10:20 am

Nassima Bouzid, Capital One

Available Media

LLM-based simulators are promising new tools for generating synthetic data, especially under conditions that are challenging for traditional Differential Privacy (DP) methods (i.e., high-dimensional tabular data). By condensing customer attributes and behaviors into compact user profiles under DP, we can seed an LLM with data to generate realistic customer transactions. We tested this "Profile-then-Simulate" approach on financial transaction data using PersonaLedger, an LLM-based generator, and compared it to direct DP synthesis on the same dataset.

We found that the LLM-based approach produces usable synthetic data, but direct synthesis still significantly outperforms it on both fraud detection utility and distributional fidelity. We identified systematic LLM biases, not DP noise, as the dominant source of error. The model's learned priors about "typical" financial behavior consistently overrode the statistical distributions we provided as input, particularly for demographic and categorical features, resulting in divergent output data.

This talk shares practical lessons for privacy engineers considering generative AI for synthetic data: (1) LLM biases may dominate DP noise as the primary source of distributional error; (2) direct DP synthesis remains competitive for tractable datasets; and (3) rigorous fidelity evaluation is essential before deploying LLM-generated synthetic data in production pipelines.

Coauthors: Dehao Yuan, Nam H. Nguyen, Mayana Pereira

Nassima Bouzid is a Senior Machine Learning Engineer at Capital One, where she focuses on differential privacy and privacy-enhancing technologies. She holds a PhD in evolutionary biology from the University of Washington, where she studied diversification and environmental adaptation of lizards in Yosemite National Park. From genetic testing to insurance operations to fintech, she's consistently drawn to problems without established playbooks. Her work sits at the intersection of applied research and engineering, translating ambiguous, cross-domain problems into concrete, measurable solutions.

10:20 am–10:50 am

Coffee and Tea Break

Mezzanine East/West

10:50 am–12:10 pm

Secure ML & MPC

Room Captain: Masooda Bashir, University of Illinois Urbana–Champaign

Private Tuning of LLMs in Practice: From VaultGemma to Custom Fine-Tuning

Tuesday, 10:50 am11:10 am

Amer Sinha, Google

As enterprises adopt Large Language Models (LLMs), fine-tuning on proprietary data is essential for performance but introduces significant privacy risks. We present JAX Privacy, the actively maintained open-source library recently used to train Google's VaultGemma, proving that Differential Privacy (DP) works at production scale.

Moving beyond theoretical definitions, this talk focuses on the privacy engineering required to deploy secure LLMs. We define a precise threat model—protecting specific fine-tuning datasets from extraction attacks—and demonstrate a complete workflow using the Gemma model family. We provide a data-driven "before vs. after" analysis, contrasting the memorization risks in standard fine-tuning against the guarantees provided by JAX Privacy. Attendees will gain practical insights into managing computational overhead, optimizing privacy-utility trade-offs, and using our latest tooling to audit and deploy privately tuned models in cloud environments.

Authors: Ryan McKenna, Galen Andrew, Borja Balle, Vadym Doroshenko, Arun Ganesh, Weiwei Kong, Alex Kurakin, Brendan McMahan

Amer Sinha is a Software Engineer at Google Research specializing in privacy-preserving machine learning. His expertise spans several areas of applied ML, including computer vision for autonomous vehicles and video analysis, fairness and robustness in large-scale systems, ad prediction models and large language models.

Private AI: Building Trust Through Verifiable Computation

Tuesday, 11:10 am11:30 am

Mingshen Sun and Mateus Guzzo, TikTok

Available Media

AI has transformed how people learn, work and live - automating complex tasks and extracting insight from massive datasets. But most powerful AI today (especially large language models) runs on server-class hardware, which typically means user prompts and context must be visible to the service provider to be processed. While acceptable for some cases, it is still challenging with highly sensitive data where users expect similar protections as end-to-end encryption. Private Verifiable Compute (PVC) is a technical solution that can enable users to initiate a request to a private and verifiable environment for context-aware AI processing with sensitive data, where no one, including service providers, can access them. With PVC in the cloud environment, it unleashes full potentials of AI hardware in the data center for complex AI tasks, such as large language models (LLMs), generative AI and beyond, while guaranteeing user privacy and verifiable transparency.

Mingshen Sun is a research scientist at TikTok, leading innovation and adoption of the Privacy Enhancing Technologies and Confidential Computing. Previously, he worked on multiple open source projects on building safe, secure and trustworthy systems. Mingshen also published academic papers and presented industry innovations on topics at the intersection of privacy and security, operating systems, and programming languages. He also serves on Technical Advisory Council of Confidential Computing Consortium.

Mateus Guzzo is a community architect, researcher, and designer at the intersection of privacy, open technology, and platform governance. He is a community & developer advocate for privacy enhancing technologies (PETs) at TikTok, a member of the Confidential Computing Consortium Outreach Committee, and an advisory board member for OpenUK's SooCon26. His work is focused on co-designing platforms for responsible technology cooperation.

Production Multi-Party Computation via the Distributed Aggregation Protocol

Tuesday, 11:30 am11:50 am

Tim Geoghegan and J.C. Jones, Internet Security Research Group

Available Media

Multi-party computation (MPC) has long promised privacy-preserving data aggregation, but practical deployments remain rare. The Distributed Aggregation Protocol (DAP), currently progressing through IETF standardization, changes this narrative. We present Divvi Up, a production deployment of DAP processing billions of contributions from widely-deployed applications including Mozilla Firefox. Unlike research prototypes demonstrating MPC feasibility on small datasets, Divvi Up operates at internet scale with multiple independent aggregation servers performing secure multi-party computation on real user data.

This talk covers the architecture of DAP and demonstrates how modern MPC can move beyond academic proofs-of-concept to provide practical, scalable privacy infrastructure. Attendees will learn about deploying privacy-preserving telemetry in production environments and understand the benefits and challenges of deploying MPC to improve privacy, such as:

  • Composing private aggregation with differential privacy
  • Computational overhead in clients and browsers
  • Tradeoffs between flexibility and privacy

Tim (he/him) is an engineer at the Internet Security Research Group and a co-author of the DAP specification at the IETF. He has worked stints at Apple and Square before leading ISRG's role in COVID-19-era exposure notifications as an early application of secure multi-party computation. At ISRG, Tim builds privacy-preserving infrastructure that operates at internet scale. Beyond the zero-knowledge proofs used in DAP's aggregation functions, his current work includes research into zero-knowledge proof systems for digital identity applications.

J.C. (they/them) is an engineer at the Internet Security Research Group, where they work on Divvi Up, a production deployment of the Distributed Aggregation Protocol for privacy-preserving telemetry. J.C. was part of the original Let's Encrypt launch team and previously led the Cryptography Engineering team at Mozilla, where they co-authored the W3C Web Authentication specification (Passkeys) and built Firefox's CRLite whole-web PKI revocation system. They bring extensive experience scaling cryptographic infrastructure from research concepts to internet-scale production deployments.

Privacy in Theory, Bugs in Practice: Grey-Box Testing for Differential Privacy Libraries

Tuesday, 11:50 am12:10 pm

Jack Fitzsimons, Oblivious

Differential privacy (DP) implementations are notoriously prone to bugs. Small mistakes can compromise or completely remove the protection provided to user data. And developers building them have few tools to help them get it right. While there are some ways to test DP tools, their limitations mean that, in practice, developers are reduced to trying very hard not to make mistakes.

To help developers build better DP tools, we introduce a new open-source test framework that can detect many common implementation bugs. We explain how it works and what types of problems it can identify. And we show that it works in practice by using it to identify 13 bugs in 11 open-source differential privacy libraries, and discussing how it helped us build the latest version of our DP SQL engine.

This talk is adapted from a PoPETS paper by my colleagues Tudor Cebere, David Erb, Damien Desfontaines, Aurélien Bellet, and Jack Fitzsimons.

Jack is the CTO and co-founder of Oblivious, an Irish-based technology company focused on privacy-enhancing technologies. He holds a D.Phil (PhD) from the University of Oxford, and has worked on a wide range of data-centric challenges in industry; from topics in computer vision at NASA's Jet Propulsion Laboratory to quantitative data analysis at ElectroRoute, the European energy trading subsidiary of Mitsubishi. Jack has been an active member of the UN's Privacy-Preserving Technologies Task Team since 2020, the UN PET Lab since its inception and an author of the NIST Differential Privacy Deployment Registry.

12:10 pm–1:40 pm

Lunch

Terra Courtyard

1:40 pm–3:10 pm

Emerging Threats

Room Captain: Shivan Kaul Sahib, Brave Software, Inc.

The Emperor's New Embeddings: Obfuscating ML Inputs Doesn't Provide Privacy

Tuesday, 1:40 pm2:00 pm

Jack Fitzsimons, Oblivious

When you think about PETs in machine learning, you likely think about protecting the training data: there are well-developed tools and approaches to ensuring that your model doesn't leak user data it was trained on. But that's only half of the story: what about protecting the data used for inference?

In the last few years, there's been a growing thread of research and some commercial offerings that promise just that: to protect model inputs while still allowing inference. These tools transform data so that it is hard to recover the original input, but in a way that still allows a model to make accurate predictions.

Does that sound too good to be true? That's because it is! This talk will look at the (bad) ways that these tools are measuring "privacy", the fundamental limits of how much we can protect, and whether there are any alternative approaches.

This work is based on a collaboration by Jack Fitzsimons, Daniel Simmons-Marengo, Tudor Cebere and Damien Desfontaines.

Jack is the CTO and co-founder of Oblivious, an Irish-based technology company focused on privacy-enhancing technologies. He holds a D.Phil (PhD) from the University of Oxford, and has worked on a wide range of data-centric challenges in industry; from topics in computer vision at NASA's Jet Propulsion Laboratory to quantitative data analysis at ElectroRoute, the European energy trading subsidiary of Mitsubishi. Jack has been an active member of the UN's Privacy-Preserving Technologies Task Team since 2020, the UN PET Lab since its inception and an author of the NIST Differential Privacy Deployment Registry.

Surfacing Hidden Privacy Risks in Code: Lessons from LLM and Retrieval Assisted Detection

Tuesday, 2:00 pm2:20 pm

Bushra Aloraini and Jimmy Haslam, Microsoft

Many privacy failures are caused less by obvious code that handles privacy such as encryption, retention, deletion, and more by everyday design and coding choices that unintentionally expose sensitive data. A common example is Default Grant Access (DGA), whereby access is permitted unless a developer explicitly blocks it. DGA is difficult for traditional static analysis because the risk often emerges from context: defaults, conditional logic, and framework behavior.

We report lessons learned from deploying language model-based detection for DGA in pull requests at scale, and from replacing a static prompt approach with retrieval-augmented generation (RAG) to address high false-positive rates. In an evaluation spanning nine production repositories, four languages, and 183,000+ methods, the RAG system identified more than twice as many confirmed privacy-relevant issues as the standalone LLM approach. However, false-positive rates did not improve significantly, and we observed language-specific noise, especially in TypeScript and C++.

We cover practical engineering insights: Curating high-quality examples from known incidents, managing corpus retrieval trade-offs and integrating detections into pull request review workflows without overwhelming reviewers.

Bushra Aloraini is a Senior Security Researcher at Microsoft working at the intersection of privacy and security. She focuses on large-scale code analysis, developing detection methodologies and partnering with engineering partners to translate anti‑pattern insights into practical tooling. She holds a Ph.D. in Computer Science from the University of Waterloo and is an inventor of privacy-focused analysis techniques.

Jimmy Haslam is a Senior Software Engineer at Microsoft with 20+ years of experience. Before Microsoft, he worked in biotech developing analytical methods for cancer detection and treatment. He now applies the same pattern‑recognition and anomaly‑detection toolkit to software systems, uncovering privacy‑ and security‑relevant coding anti‑patterns at scale.

V.O.I.C.E.: A Data-Driven Risk Taxonomy for Synthetic Voice Generation

Tuesday, 2:20 pm2:40 pm

Tanusree Sharma and Lili Dudas, The Pennsylvania State University

This talk presents V.O.I.C.E - a taxonomy of voice generation risk grounded in a multi-source threat modeling effort with 569 incidents from major AI incident database, FTC and Internet Crime Complaint Center (IC3); direct incident reports from U.S. based participants across diverse groups (including voice actors, internet personalities, political personnel, and general public); and 2,221 Reddit discussions. This taxonomy explicitly models how risk varies with contextual factors such as degree of voice exposure, social visibility, and the availability of legal protections for various affected groups. The goal of this talk is to translate this complex risk landscape into clear, actionable insights for organizations and policymakers.

Tanusree Sharma is an assistant professor at Pennsylvania State University. Her work explores the synergy between security and privacy, design engineering and AI safety. Her research examine the question around - How can we design secure systems that improve the quality of human interaction and human data authenticity in digital ecosystems? Tanusree's current work is supported by Meta, Google and OpenAI.

Lili Dudas is a 1st year PhD student at Penn State University, studying Informatics with an interest in privacy, governance, and the intersection of social systems and data. She graduated with a MI and B.A. in Information Technology and Informatics from Rutgers University.

The Disposable Identity: Eliminating Non-Human Identity Risk in Federal Healthcare Pipelines

Tuesday, 2:40 pm3:00 pm

Tristan McKinnon, Deterministic Systems Lab and Axle Informatics

Non-Human Identity is the fastest-growing attack surface in cloud data environments — and the least discussed. While zero-trust implementations focus on user authentication and network segmentation, the persistent service accounts powering automated pipelines remain long-lived, over-privileged, and largely unexamined. In federal healthcare environments processing genomic and clinical data under FedRAMP High constraints, a single compromised ingestion role means bucket-wide access for up to 90 days. That is not least privilege. That is a liability.

This talk is a production case study, not a proposal. We deployed an Identity-Per-Transaction (IPT) pipeline for a federal life sciences agency, generating a unique cryptographically scoped ephemeral credential for every file ingestion event and destroying it milliseconds later. We present the operational reality of running this in production: STS AssumeRole latency averaging 180ms, peak issuance rates during batch windows, the race condition we hit under high concurrency, and what happens when you need to debug a system whose credentials no longer exist.

We also address two hard problems directly. First, the secret-zero problem: the broker that issues ephemeral credentials is itself a root of trust. If it's compromised, the blast radius is bounded to 900 seconds rather than 90 days, but it is not zero. Second, the genomic privacy boundary: tokenizing 18 HIPAA identifiers does not anonymize a VCF file. This architecture eliminates credential-mediated access risk. It is not a genomic privacy framework. Those are different problems.

Attendees leave with a concrete reference pattern for eliminating Non-Human Identity risk in high-compliance storage systems, an honest accounting of where the architecture holds and where it defers, and open questions about trust bootstrapping that the field has not yet resolved.

Tristan McKinnon is Principal Investigator at Deterministic Systems Lab and a Senior Healthcare Data Engineer at Axle Informatics, where he designs zero-trust data pipelines processing genomic and clinical data at NIH scale under FedRAMP High constraints. He formalized the Identity-Per-Transaction (IPT) protocol, a framework applying ephemeral, transaction-scoped credentials to Non-Human Identity risk in high-compliance cloud environments, piloted at FedRAMP High scale on NIH data infrastructure. His research on AI integration in data science education is forthcoming in an Elsevier volume in Summer 2026.

Shadow Data in Tool Calls: The Privacy Leak Hiding in Plain Sight

Tuesday, 3:00 pm3:05 pm

Shabista Shabista and Ravi Gupta, Independent; Mayank Kumar Raunak, Intel Corporation

"Run it locally and your data stays private." This assumption is dangerously wrong.

When an AI agent calls external tools—weather APIs, calendars, maps, search—the request itself leaks user data. A local agent asking "What's the weather at my doctor's office?" sends exact coordinates to a third party. The agent may be local, but your medical visit pattern isn't.

We are analyzing tool calls from a prototype smart home agent. Preliminary testing indicates that the majority leak personally identifiable or sensitive contextual information in the request parameters alone—before any response is processed. Location, health indicators, financial patterns, and relationship data routinely escape through tool call payloads.

This talk presents a working Tool Call Sanitizer deployed on Raspberry Pi that intercepts, analyzes, and generalizes outbound requests in real-time. We target significant reduction in data leakage with minimal degradation in task utility. Privacy engineering must extend beyond the agent to the entire tool ecosystem.

Shabista is a Senior Software Engineer with over a decade of experience designing and delivering enterprise-grade software across healthcare, fintech, and HR technology. Her expertise spans end-to-end product development, from greenfield architecture and secure deployment systems to configurable integration frameworks serving thousands of customers. She is also actively pursuing research in areas such as LLM optimization, edge AI, and neural architecture search. She holds a Master's in Computer Science and brings a distinctive combination of production engineering rigor and applied research insight to her work.

Ravi Gupta is a Deep Learning Technical Lead specializing in accelerating AI model computations on large-scale GPU systems, with expertise spanning PyTorch, vLLM, and distributed inference frameworks. He holds a Master's in Computer Engineering from Purdue University, where his research focused on automated performance tuning of CUDA kernels using machine learning at Lawrence Livermore National Laboratory. With over a decade of industry experience, he has led teams in deploying scalable LLM inference and fine-tuning solutions, profiling multi-GPU and multi-node systems, and optimizing collective operations on supercomputing clusters. His work bridges high-performance computing and deep learning, driving efficient deployment of large language models across diverse hardware accelerators.

Mayank Kumar Raunak is an AI Framework Engineer at Intel Corporation, where he works on advancing AI frameworks and compiler technologies for high-performance machine learning systems. He leads efforts to enable JAX and XLA on Intel GPUs in collaboration with Google and has played a key role in delivering TensorFlow CPU Windows releases to the open-source community via PyPI. His work focuses on deep learning optimization, compiler improvements, and performance acceleration across CPU and GPU platforms using technologies such as oneDNN, XLA, and JAX.

Previously, he worked at Intel Labs on collaborative research projects with Princeton University, University College London, and Yale University, contributing to model interoperability standards and transformer-based language model optimization. His research background also includes computer vision, generative models, and speech verification systems, developed during his time as a Data Scientist at Indiana University.

Adopting AI in Local Government with Privacy and Equity in Mind: A Case Study of the City of Oakland

Tuesday, 3:05 pm3:10 pm

Vinal Dalcy Dsouza and Rebecca Williams Earle, Northeastern University

We are partnering with the City of Oakland to understand how AI can safely support city workflows and processes. Federal cuts are increasing the strain on local governments, which must meet community needs with fewer resources. Artificial intelligence is often proposed as a solution, but local governments can be wary due to privacy and equity concerns. Oakland is piloting Microsoft Copilot, and we are measuring participant experiences and perceptions throughout the study through interviews and surveys with a sample of participants from a variety of city departments. We will share snapshots of participant perceptions of AI and privacy from multiple touchpoints during this 6 month study. We will also discuss how privacy and equity concerns, including public record requests, disparities in AI access, and fears of skill atrophy, shaped and in some cases, limited participant engagement with AI.

Authors: Shufan Chai, Vinal Dalcy Dsouza, Rebecca Williams Earle, Rasika Bhalerao and Jessica Staddon

Vinal Dalcy Dsouza is a research assistant and Master's student in Computer Science at Northeastern University's Oakland campus. With around five years of hands‑on data engineering experience, she enjoys digging into messy real‑world data, making it usable, and then building systems that keep it flowing. She's especially interested in where data engineering and AI meet and what can go wrong there from security risks to responsible use. Before grad school, she earned her bachelor's degree in Information Science with a focus on business intelligence, which still shapes how she thinks about data, value, and impact in her work.

Rebecca Williams Earle is a research assistant and Master's student in Computer Science at Northeastern University's Oakland campus. Her interests include security, privacy, and accessibility. She holds a BA from Williams College and has over six years of professional software engineering experience focusing on front-end development and web accessibility. She's particularly interested in the intersection of human-centered design and technical security, and how to build systems that protect users without creating barriers to access.

3:10 pm–3:40 pm

Coffee and Tea Break

Mezzanine East/West

3:40 pm–5:00 pm

Privacy Engineering

Room Captain: Sai Chandramouli, Meta Platforms, Inc.

Mapping the Privacy Workforce in the AI Era

Tuesday, 3:40 pm4:00 pm

Ramazan Yener, University of Illinois Urbana–Champaign

Organizations across sectors hire many privacy professionals. Yet we have limited empirical evidence about how privacy roles are defined in hiring practices. This talk presents insights from an ongoing analysis of privacy-related job postings across major hiring platforms. The research uses a multi-stage approach that includes Natural Language Processing (NLP) to examine how employers describe privacy responsibilities, what kinds of skills and backgrounds they seek, and how often privacy work is connected to AI and emerging technologies. At a high level, privacy roles are increasingly hybrid, combining legal, technical, and organizational responsibilities. Results show growing signals that privacy teams are being positioned as part of broader AI governance and risk management efforts. These findings provide a practical snapshot of how privacy work is evolving and raise important questions about workforce development, training, and team design.

Researchers: Ramazan Yener, Muhammad Hassan, Masooda Bashir

Ramazan Yener is an academic researcher at the iSPICE Lab and a PhD student in Information Science at the University of Illinois Urbana–Champaign, focusing on privacy engineering, AI governance, and cybersecurity. His research examines how privacy is operationalized in practice, including studies of privacy engineering roles, national and international standards (ISO/IEC 27001, ISO/IEC 27701, NIST AI RMF, NIST CSF, and the NIST Privacy Framework), and the privacy risks of AI-driven systems such as healthcare chatbots, agentic AI, and IoT-enabled platforms. He approaches these systems from a user-centered and socio-technical perspective to identify where design and governance choices introduce privacy and security risks.

Privacy Review for Non-Maniacs

Tuesday, 4:00 pm4:20 pm

Katriel Cohn-Gordon, Meta

Imagine you want your test suite to pass for all commits. Would you create a Test Review process where all teams answer a Tests Questionnaire and submit screenshots as proof, which are then reviewed by Tests Engineers? Obviously, only a maniac would do this: it works, but is much more annoying than a pre-commit hook.

Many companies have some form of Privacy Review. You start with "we'll figure out the requirements", and then ask "did you integrate with deletion tooling?", and before you know it you have screenshots of tests. We can improve privacy and remove manual processes by defining requirements upfront and checking them automatically: you just need to translate engineering language ("write a script that checks all relevant data is delete-able") into compliance language ("mitigate Deletion risks by verifying Key Control Indicators for the Deletion Framework Controls").

We spent the last years at Meta making this change to our Privacy Program: we reduced ad-hoc human decisions in favour of upfront requirements and systematic verification, making our Privacy Reviews both faster and more effective. Out of ~500 requirements, roughly half now rely on automated verifiers. In this talk we'll explain how hopefully you too can do the same!

Additional author: Ben Fonarov

Katriel Cohn-Gordon is a software engineer on Meta's Privacy Infrastructure team, where he has worked on data access rights, data transfers and portability, cookies infrastructure, deletion, and other Privacy topics. Before moving to Meta he wrote pen-and-paper proofs for secure messaging protocols, and still dabbles in end-to-end encryption topics such as accountability for Javascript cryptography. He lives in London with his partner, a pea-brained black cat, and a large collection of houseplants.

Training Developers' Privacy Awareness with Enforcement Cases

Tuesday, 4:20 pm4:40 pm

Shao-Yu Chu, University of California, San Diego

Developers' decisions about data collection, sharing, and defaults directly shape users' privacy. In practice, however, developers frequently make problematic design decisions. Even when privacy experts are available, low privacy awareness can prevent developers from recognizing issues in the first place, leaving concerns unescalated.

In this talk, we present Ace Privacy Detective, an Ace Attorney-inspired investigation game that turns enforcement cases into hands-on training materials for developers. We'll show how we translate legal documents into game levels where players examine real-world data practices, gather evidence, and report violations with feedback grounded in the alleged violations. We'll demo the game and walk through how levels are built from public complaints. We will also share results from a between-subjects study comparing the game to a reading-based baseline that mirrors how many developers learn about privacy today.

Authors: Shao-Yu Chu, Xu Wang, Haojian Jin

Shao-Yu Chu is a PhD student at the University of California, San Diego. She develops tools and learning interventions to help developers build software that respects users' privacy.

Scaling Privacy Threat Modeling: From Architects to Developers

Tuesday, 4:40 pm5:00 pm

Nitish Uplavikar, Comcast Cable

Privacy threat modeling is essential for assessing an application's privacy posture at an architecture level. Yet automated tooling that allows thus analysis to scale remains limited compared to security threat modeling. We present an open-source contribution of twelve privacy threat modeling rules implemented in Threagile, a machine assisted threat modeling toolkit. These rules analyze system architecture, data flows, and technical assets to automatically identify privacy violations including data minimization failures, unauthorized disclosure, insecure storage, and re-identification risks.

Our rules align with the LINDDUN privacy threat modeling framework, covering threat categories such as Linking, Identifying, Data Disclosure, Unawareness, and Non-compliance, while also addressing some OWASP-based Privacy Risks. Each rule provides high fidelity detection logic with actionable mitigations, enabling privacy-by-design practices early in development.

It can be used by privacy architects for assessments as well as designers and developers as a self-service tool. Our contribution bridges the gap between privacy threat modeling theory and practical implementation, offering a systematic privacy risk assessment.

Nitish Uplavikar is a senior researcher on the SPIDER at Comcast Cable. Nitish has a PhD in Computer Science from University of Missouri-Columbia, where his research focused on Privacy Preserving Data Analytics.