Invent More, Toil Less
Betsy Beyer, Brendan Gleason, Dave O’Connor, and Vivek Rau
This article builds upon Vivek Rau’s chapter “Eliminating Toil” in Site Reliability Engineering: How Google Runs Production Systems. We begin by recapping Vivek’s definition of toil and Google’s approach to balancing operational work with engineering project work. The Bigtable SRE case study then presents a concrete example of how one team at Google went about reducing toil. Finally, we leave readers with a series of best practices that should be helpful in reducing toil no matter the size or makeup of the organization.