Latency and Availability Error Budgets Done Right at Scale

Tuesday, December 08, 2020 - 4:05 pm4:25 pm

Learn how Zendesk developed formulas for implementing SLIs, SLOs, and Error Budgets at scale across a team of 1,000 engineers.

Error Budgets tell us when we should stop working on features and instead work on reliability. Because we use them to prioritize expensive resources (not to mention protect our revenue streams), we want them to be as accurate as possible. How do you empower 1,000+ engineers to solve these problems correctly in systems at scale?

Fred Moyer, Zendesk

Fred is an SRE and resident SLOgician (like statistician, not magician) at Zendesk. He previously worked with high scale telemetry at Circonus, and scaled large web systems at Turnitin. Fred developed the first Istio community adapter in 2018, and was a White Camel Award winner in 2013. He likes to daydream about SLOs and Error Budgets while riding his mountain bike.

@inproceedings {262241,
author = {Fred Moyer},
title = {Latency and Availability Error Budgets Done Right at Scale},
booktitle = {SREcon20 Americas (SREcon20 Americas)},
year = {2020},
url = {},
publisher = {{USENIX} Association},
month = dec,

