Narayan Desai and Brent Bryan, Google
This talk presents an exciting analytical method that is successfully delivering high fidelity insights useful in analyzing and diagnosing distributed systems. It has been used in production in a variety of complex services at scale (up to 1.4T events/day), where traditional methods have failed, with good results. We will sketch out the problem domain in detail, present the statistical methods used, as well as the intuition behind the approach.
Attendees will gain an alternative lens through which they can analyze performance, as well as an understanding of pitfalls.
Narayan Desai, Google
Narayan is an SRE at Google Cloud, where he is responsible for the reliability of GCP Data Analytics products. He has a checkered past, having worked on scheduling, configuration management, supercomputers, and metagenomics—always in the context of production systems.
Brent Bryan, Google
Brent is an SRE at Google Cloud focused on developing statistical and ML approaches to monitor service reliability. Prior to GCP SRE, Brent worked on ads optimization, serving, and measurement, as well as founding Google Domains.
SREcon22 Americas Open Access Sponsored by Blameless
author = {Narayan Desai and Brent Bryan},
title = {Principled Performance Analytics},
year = {2022},
address = {San Francisco, CA},
publisher = {USENIX Association},
month = mar
}