Anomaly Detection on Golden Signals

Wednesday, June 12, 2019 - 11:00 am12:00 pm

Yu Chen, Baidu

Abstract: 

Anomaly detection on golden signals, including latency, traffic, errors, and saturation, can detect system failures and provide important clues for failure diagnosis. In this talk, we will introduce our algorithm toolbox for anomaly detection on the golden signals.

The toolbox leverages historic data from the signals to build appropriate probability models. The alerts are hence generated based on the probability calculated from the observation and the probability model. The probability directly relates to the false positive rate of classification and is able to represent the SRE engineers' feeling. Furthermore, the probability values are comparable across different signals. So, it becomes a good feature for failure diagnosis. From our production system, the alerting precision ranges from 70% to 90%, and the recall is around 90%.

Yu Chen, Baidu

Yu Chen is a Data Architect at the IOP group of Baidu’s SRE department. His work focuses on developing algorithms for alerting and diagnosis, in order to improve the stability of production systems. Previously, he worked at Microsoft Research Asia. His research interests are distributed systems, consensus protocols, search ranking, and query recommendation.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@conference {233217,
author = {Yu Chen},
title = {Anomaly Detection on Golden Signals},
year = {2019},
address = {Singapore},
publisher = {USENIX Association},
month = jun
}

Presentation Video