SLIs, SLOs, and Error Budgets at Scale

Due to the evolving Coronavirus/COVID-19 situation, SREcon20 Americas West has been rescheduled to June 2–4, 2020.
More information is available here.

Thursday, March 26, 2020 - 12:40 pm1:00 pm

Fred Moyer, Zendesk, Inc.

Abstract: 

How can one democratize the implementation of SLIs, SLOs, and Error Budgets to put them in the hands of a thousand engineers at once?

At Zendesk we developed simple algorithms and practical approaches for implementing SLIs, SLOs, and Error Budgets at scale using a number of observability tools. This talk will show the approaches developed and how we were able to manage observability instrumentation across dozens of teams quickly in a complex ecosystem (CDN, UI, middleware, backend, queues, dbs, queues, etc).

This talk is for engineers and operations folks who are putting SLIs, SLOs, and Error Budgets into practice. Attendees will come away with concrete examples of how to communicate and implement Error Budgets across multiple teams and diverse service architectures.

Fred Moyer, Zendesk, Inc.

Fred is an SRE and the resident SLOgician at Zendesk, where he works to use Error Budgets to deliver best in class reliability for Zendesk's services. Previously he wrangled large scale web systems at Turnitin.com, and earned his Monitoring and Observability wings at Circonus, dealing with large scale time series telemetry implementations. Fred is a Perl White Camel Award winner (2018), and received an award from Google for the first Istio community adapter (2018). He has two young kids, so he needs more sleep when he's not on the stationary bike in his garage. He still likes to hack C, Go, Perl, and Ruby.

BibTeX
@conference {247251,
author = {Fred Moyer},
title = {SLIs, SLOs, and Error Budgets at Scale},
year = {2020},
address = {Santa Clara, CA},
publisher = {{USENIX} Association},
month = mar,
}