Data Classification at Scale: Taming the Hydra

Monday, June 09, 2025 - 4:20 pm4:40 pm

Daniel Gagne, Meta

This talk goes into detail about the data classification processes at Meta, where we assign metadata about the semantics, actor, and other attributes of the data. We start by defining a taxonomy to support categorization based on the nature of data and regulatory requirements which will be used to ensure appropriate data usage. This supports a wide variety of privacy policies such as access control, deletion, and purpose limitation. We then take a bytes up approach to scan data, extract features, and infer labels from the taxonomy. We also detail challenges with different data storage patterns, classification approaches and quality measurement.

Additional Author: Giuseppe M. Mazzeo

Danny Gagne is a Software Engineer on the Privacy Infrastructure team at Meta. He holds a B.S. in Computer Science from Northeastern University. He has worked on large scale data classification at the MITRE Corporation and at the International Atomic Energy Agency.

BibTeX
@conference {306703,
author = {Daniel Gagne},
title = {Data Classification at Scale: Taming the Hydra},
year = {2025},
address = {Santa Clara, CA},
publisher = {USENIX Association},
month = jun
}

Presentation Video