Zeno: Diagnosing Performance Problems with Temporal Provenance

Authors: 

Yang Wu, Facebook; Ang Chen, Rice University; Linh Thi Xuan Phan, University of Pennsylvania

Abstract: 

When diagnosing a problem in a distributed system, it is sometimes necessary to explain the timing of an event—for instance, why a response has been delayed, or why the network latency is high. Existing tools o er some support for this, typically by tracing the problem to a bottleneck or to an overloaded server. However, locating the bottleneck is merely the first step: the real problem may be some other service that is sending traffic over the bottleneck link, or a misbehaving machine that is overloading the server with requests. These off-path causes do not appear in a conventional trace and will thus be missed by most existing diagnostic tools.

In this paper, we introduce a new concept we call temporal provenance that can help with diagnosing timing-related problems. Temporal provenance is inspired by earlier work on provenance-based network debugging; however, in addition to the functional problems that can already be handled with classical provenance, it can also diagnose problems that are related to timing. We present an algorithm for generating temporal provenance and an experimental debugger called Zeno; our experimental evaluation shows that Zeno can successfully diagnose several realistic performance bugs.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {226008,
author = {Yang Wu and Ang Chen and Linh Thi Xuan Phan},
title = {Zeno: Diagnosing Performance Problems with Temporal Provenance},
booktitle = {16th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 19)},
year = {2019},
isbn = {978-1-931971-49-2},
address = {Boston, MA},
pages = {395--420},
url = {https://www.usenix.org/conference/nsdi19/presentation/wu},
publisher = {{USENIX} Association},
}