You are here
Datalog as a Lingua Franca for Provenance Querying and Reasoning
Saumen Dey and Sven Köhler, UC Davis; Shawn Bowers, Gonzaga University; Bertram Ludäscher, UC Davis
Provenance, i.e., the lineage and processing history of data, has become increasingly important within scientific workflow systems. Provenance information can be used, e.g., to explain, debug, and reproduce the results of computational experiments as well as to determine the validity and quality of data products. Standard models for representing provenance information (such as OPM) largely focus on providing a minimal, common set of observables and constraints (in terms of causal and temporal relationships). For scientific workflow applications, however, the workflow itself and the corresponding (implicit) contraints on provenance relationships are often essential for interpreting and querying provenance information. In this paper, we propose Datalog as a “lingua franca” for representing, querying, and specifying integrity constraints over provenance information, and introduce a unifying provenance model for specifying workflows, traces, and temporal constraints. We also demonstrate advantages of using Datalog together with the unified model through a number of examples.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.