Narayan Desai, Google
SLOs are a wonderfully intuitive concept: a quantitative contract that describes expected service behavior. These are often used in order to build feedback loops that prioritize reliability, communicate expected behavior when taking on a new dependency, and synchronize priorities across teams with specialized responsibilities when problems occur, among other use cases. However, SLOs are built on an implicit model of service behavior, with a raft of simplifying assumptions that don't universally hold.
These simplifying assumptions make SLO rules of thumb fall apart with complex modern services, which can result in bad decision making. In this talk, I will catalog a range of these issues with SLOs and demonstrate how they cause systematic failures of SLO-based processes. Armed with the knowledge of these failure modes, I'll present a set of best practices for understanding when SLOs produce incorrect and unexpected results and a set of techniques for constructing robust SLOs.
Narayan Desai is an SRE at Google, where he focuses on the reliability of Google Cloud Platform Data Analytics products. He has a checkered past, having worked on scheduling, configuration management, supercomputers, and metagenomics—always in the context of production systems.