Seer: Leveraging Big Data to Navigate the Increasing Complexity of Cloud Debugging

Authors: 

Yu Gan, Meghna Pancholi, Dailun Cheng, Siyuan Hu, Yuan He, and Christina Delimitrou, Cornell University

Abstract: 

Performance unpredictability in cloud services leads to poor user experience, degraded availability, and has revenue ramifications. Detecting performance degradation a posteriori helps the system take corrective action, but does not avoid the QoS violations. Detecting QoS violations after the fact is even more detrimental when a service consists of hundreds of thousands of loosely-coupled microservices, since performance hiccups can quickly propagate across the dependency graph of microservices. In this work we focus on anticipating QoS violations in cloud settings to mitigate performance unpredictability to begin with.

We propose Seer, a cloud runtime that leverages the massive amount of tracing data cloud systems collect over time and a set of practical learning techniques to signal upcoming QoS violations, as well as identify the microservice(s) causing them. Once an imminent QoS violation is detected Seer uses machine-level hardware events to determine the cause of the QoS violation, and adjusts the resource allocations to prevent it. In local clusters with 10 40-core servers and 200-instance clusters on GCE running diverse cloud microservices, we show that Seer correctly anticipates QoS violations 91% of the time, and attributes the violation to the correct microservice in 89% of cases. Finally, Seer detects QoS violations early enough for a corrective action to almost always be applied successfully.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {216817,
author = {Yu Gan and Meghna Pancholi and Siyuan Hu and Dailun Cheng and Yuan He and Christina Delimitrou},
title = {Seer: Leveraging Big Data to Navigate the Increasing Complexity of Cloud Debugging},
booktitle = {10th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 18)},
year = {2018},
address = {Boston, MA},
url = {https://www.usenix.org/conference/hotcloud18/presentation/gan},
publisher = {USENIX Association},
month = jul
}