Interrupt Reduction Projects

Betsy Beyer, John Tobin, and Liz Fong-Jones
Interrupts are a fact of life for any team that’s responsible for maintaining a service or software. However, this type of work doesn’t have to be a constant drain on your team’s bandwidth or resources. This article begins by describing the landscape of work faced by Site Reliability Engineering (SRE) teams at Google: the types of work we undertake, the logistics of how SRE teams are organized across sites, and the inevitable toil we incur. Within this discussion, we focus on interrupts: how teams initially approached tickets, and why and how we implemented a better strategy. After providing a case study of how the ticket funnel was one such successful initiative, we offer practical advice about mapping what we learned to other organizations.