Discrepancy Detection in Whole Network Provenance


Raza Ahmad, SRI; Eunjin Jung, University of San Francisco; Carolina de Senne Garcia, Ecole Polytechnique; Hassaan Irshad and Ashish Gehani, SRI


Data provenance describes the origins of a digital object. Such information is particularly useful when analyzing distributed workflows because extant tools, such as debuggers and application profilers, do not support tracing through heterogeneous runtimes that span multiple hosts. In decentralized systems, each host maintains the authoritative record of its own activity, represented as a dependency graph. Reconstructing the provenance of an object may involve the assembly of subgraphs from multiple, independently administered hosts. We term the collection of host-specific dependencies coupled with cross-host flows whole-network provenance. Such information can grow to terabytes for a small network. Aspects of distributed querying, caching, and response discrepancy detection that are specific to provenance are described and analyzed.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

@inproceedings {255018,
author = {Raza Ahmad and Eunjin Jung and Carolina de Senne Garcia and Hassaan Irshad and Ashish Gehani},
title = {Discrepancy Detection in Whole Network Provenance},
booktitle = {12th International Workshop on Theory and Practice of Provenance (TaPP 2020)},
year = {2020},
url = {https://www.usenix.org/conference/tapp2020/presentation/ahmad},
publisher = {{USENIX} Association},
month = jun,