Dependence-Preserving Data Compaction for Scalable Forensic Analysis


Md Nahid Hossain, Junao Wang, R. Sekar, and Scott D. Stoller, Stony Brook University


Large organizations are increasingly targeted in long-running attack campaigns lasting months or years. When a break-in is eventually discovered, forensic analysis begins. System audit logs provide crucial information that underpins such analysis. Unfortunately, audit data collected over months or years can grow to enormous sizes. Large data size is not only a storage concern: forensic analysis tasks can become very slow when they must sift through billions of records. In this paper, we first present two powerful event reduction techniques that reduce the number of records by a factor of 4.6 to 19 in our experiments. An important benefit of our techniques is that they provably preserve the accuracy of forensic analysis tasks such as backtracking and impact analysis. While providing this guarantee, our techniques reduce on-disk file sizes by an average of 35× across our data sets. On average, our in-memory dependence graph uses just 5 bytes per event in the original data. Our system is able to consume and analyze nearly a million events per second.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

@inproceedings {217579,
author = {Md Nahid Hossain and Junao Wang and R. Sekar and Scott D. Stoller},
title = {{Dependence-Preserving} Data Compaction for Scalable Forensic Analysis},
booktitle = {27th USENIX Security Symposium (USENIX Security 18)},
year = {2018},
isbn = {978-1-939133-04-5},
address = {Baltimore, MD},
pages = {1723--1740},
url = {},
publisher = {USENIX Association},
month = aug

Presentation Video 

Presentation Audio