The Curse of SRE Autonomy and How to Manage It

Wednesday, March 27, 2019 - 9:35 am10:05 am

Richard Bondi, Google

Abstract: 

Within an SRE organization, teams usually develop very different automation tools and processes for accomplishing similar tasks. Some of this can be explained by the software they support: different systems require different reliability solutions. But many SRE tasks are essentially the same across all software: compiling, building, deploying, canarying, load testing, managing traffic, monitoring, and so on.

There are two puzzles here: why does this diversity exist, and how can it be overcome so that SRE teams stop duplicating their development efforts?

This talk presents a solution to both puzzles using the ten-year history of a single SRE tool. The tool is used only internally at a large company. It is one of the rare tools there that has been adopted widely by our very large SRE organization.

Richard Bondi, Google

Richard Bondi has been an engineer at Google since 2011, specializing in the entire web stack and working on travel applications. In 2016 he converted to SRE, and then joined the SRE tech writer team. Before Google, and after leaving his political philosophy PhD program to join the first of many Internet startups, he published a book on cryptography with Wiley of which Bruce Schneier wrote: "This is essential reading for anyone who wants to understands the Microsoft CryptoAPI..."

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@conference {229583,
author = {Richard Bondi},
title = {The Curse of {SRE} Autonomy and How to Manage It},
year = {2019},
address = {Brooklyn, NY},
publisher = {{USENIX} Association},
}