Rui Wang, YScope; Devin Gibson, YScope and University of Toronto; Kirk Rodrigues, YScope; Yu Luo, YScope, Uber, and University of Toronto; Yun Zhang, Kaibo Wang, Yupeng Fu, and Ting Chen, Uber; Ding Yuan, YScope and University of Toronto
Internet-scale services can produce a large amount of logs. Such logs are increasingly appearing in semi-structured formats such as JSON. At Uber, the amount of semi-structured log data can exceed 10PB/day. It is prohibitively expensive to store and analyze them. As a result, logs are only kept searchable for a few days.
This paper proposes μSlope, a system that losslessly compresses semi-structured log data, and allows search without full decompression. It concisely represents the schema structures, and only keeps this representation stored once per dataset instead of interspersing it with each record. It further "structurizes" the semi-structured data by grouping the records with the same schema structure into the same table, so that each table is well structured. Our evaluation shows that μSlope achieves 21.9:1 to 186.8:1 compression ratio, which is at least a few times higher than any existing semi-structured data management systems (SSDMS); The compression ratio is even 2.34x as much as Zstandard and the search speed is on 5.77x of other SSDMSes.
OSDI '24 Open Access Sponsored by
King Abdullah University of Science and Technology (KAUST)
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Rui Wang and Devin Gibson and Kirk Rodrigues and Yu Luo and Yun Zhang and Kaibo Wang and Yupeng Fu and Ting Chen and Ding Yuan},
title = {{μSlope}: High Compression and Fast Search on {Semi-Structured} Logs},
booktitle = {18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24)},
year = {2024},
isbn = {978-1-939133-40-3},
address = {Santa Clara, CA},
pages = {529--544},
url = {https://www.usenix.org/conference/osdi24/presentation/wang-rui},
publisher = {USENIX Association},
month = jul
}