Towards Automated Collection of {Application-Level} Data Provenance

Towards Automated Collection of Application-Level Data Provenance

Dawood Tariq, Maisem Ali, and Ashish Gehani, SRI International

Gathering data provenance at the operating system level is useful for capturing system-wide activity. However, many modern programs are complex and can perform numerous tasks concurrently. Capturing their provenance at this level, where processes are treated as single entities, may lead to the loss of useful intra-process detail. This can, in turn, produce false dependencies in the provenance graph. Using the LLVM compiler framework and SPADE provenance infrastructure, we investigate adding provenance instrumentation to allow intra-process provenance to be captured automatically. This results in a more accurate representation of the provenance relationships and eliminates some false dependencies. Since the capture of fine-grained provenance incurs increased overhead for storage and querying, we minimize the records retained by allowing users to declare aspects of interest and then automatically infer which provenance records are unnecessary and can be discarded.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {179555,
title = {Towards Automated Collection of {Application-Level} Data Provenance},
booktitle = {4th USENIX Workshop on the Theory and Practice of Provenance (TaPP 12)},
year = {2012},
address = {Boston, MA},
url = {https://www.usenix.org/conference/tapp12/workshop-program/presentation/Tariq},
publisher = {USENIX Association},
month = jun
}

USENIX Conference Policies

Towards Automated Collection of Application-Level Data Provenance

Open Access Media

Presentation Video

Presentation Audio