On the Feasibility of Parser-based Log Compression in Large-Scale Cloud Systems

Authors: 

Junyu Wei and Guangyan Zhang, Tsinghua University; Yang Wang, The Ohio State University; Zhiwei Liu, China University of Geosciences; Zhanyang Zhu and Junchao Chen, Tsinghua University; Tingtao Sun and Qi Zhou, Alibaba Cloud

Abstract: 

Given the tremendous scale of today’s system logs, compression is widely used to save space. While parser-based log compressor reported promising results, we observe less intriguing performance when applying it to our production logs.

Our detailed analysis shows that, first, some problems are caused by a combination of sub-optimal implementation and assumptions that do not hold on our large-scale logs. We address these issues with a more efficient implementation. Furthermore, our analysis reveals new opportunities for further improvement. In particular, numerical values account for a significant percentage of space and classic compression algorithms, which try to identify duplicate bytes, do not work well on numerical values. We propose three techniques, namely delta timestamps, correlation identification, and elastic encoding, to further compress numerical values.

Based on these techniques, we have built LogReducer. Our evaluation on 18 types of production logs and 16 types of public logs shows that LogReducer achieves the highest compression ratio in almost all cases and on large logs, its speed is comparable to the general-purpose compression algorithm that targets a high compression ratio.

FAST '21 Open Access Sponsored by NetApp

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {264848,
author = {Junyu Wei and Guangyan Zhang and Yang Wang and Zhiwei Liu and Zhanyang Zhu and Junchao Chen and Tingtao Sun and Qi Zhou},
title = {On the Feasibility of Parser-based Log Compression in Large-Scale Cloud Systems},
booktitle = {19th {USENIX} Conference on File and Storage Technologies ({FAST} 21)},
year = {2021},
isbn = {978-1-939133-20-5},
pages = {249--262},
url = {https://www.usenix.org/conference/fast21/presentation/wei},
publisher = {{USENIX} Association},
month = feb,
}