Automated Provenance Analytics: A Regular Grammar Based Approach with Applications in Security


Mark Lemay, Boston University; Wajih Ul Hassan, University of Illinois at Urbana-Champaign; Thomas Moyer, Nabil Schear, and Warren Smith, MIT Lincoln Laboratory


Provenance collection techniques have been carefully studied in the literature, and there are now several systems to automatically capture provenance data. However, the analysis of provenance data is often left “as an exercise for the reader”. The provenance community needs tools that allow users to quickly sort through large volumes of provenance data and identify records that require further investigation. By detecting anomalies in provenance data that deviate from established patterns, we hope to actively thwart security threats. In this paper, we discuss issues with current graph analysis techniques as applied to data provenance, particularly Frequent Subgraph Mining (FSM). Then we introduce Directed Acyclic Graph regular grammars (DAGr) as a model for provenance data and show how they can detect anomalies. These DAGr provide an expressive characterization of DAGs, and by using regular grammars as a formalism, we can apply results from formal language theory to learn the difference between “good” and “bad” provenance. We propose a restricted subclass of DAGr called deterministic Directed Acyclic Graph automata (dDAGa) that guarantees parsing in linear time. Finally, we propose a learning algorithm for dDAGa, inspired by Minimum Description Length for Grammar Induction.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

@inproceedings {204255,
author = {Mark Lemay and Wajih Ul Hassan and Thomas Moyer and Nabil Schear and Warren Smith},
title = {Automated Provenance Analytics: A Regular Grammar Based Approach with Applications in Security},
booktitle = {9th USENIX Workshop on the Theory and Practice of Provenance (TaPP 2017)},
year = {2017},
address = {Seattle, WA},
url = {},
publisher = {USENIX Association},
month = jun