A First-Principles Algebraic Approach to Data Transformations in Data Cleaning: Understanding Provenance from the Ground Up


Santiago Núñez-Corrales, iSchool and NCSA, UIUC; Lan Li, iSchool, UIUC; Bertram Ludäscher, iSchool and NCSA, UIUC


We provide a model describing data transformation workflows on tables constructed from first principles, namely by defining datasets as structures with functions and sets for which certain morphisms correspond to data transformations. We define rigid and deep data transformations depending on whether the geometry of the dataset is preserved or not. Finally, we add a model of concurrency using meet and join operations. Our work suggests that algebraic structures and homotopy type theory provide a more general context than other formalisms to reason about data cleaning, data transformations and their provenance.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.