The Curse of SRE Autonomy and How to Manage It

Wednesday, March 27, 2019 - 9:35 am10:05 am

Richard Bondi, Google


Within an SRE organization, teams usually develop very different automation tools and processes for accomplishing similar tasks. Some of this can be explained by the software they support: different systems require different reliability solutions. But many SRE tasks are essentially the same across all software: compiling, building, deploying, canarying, load testing, managing traffic, monitoring, and so on.

There are two puzzles here: why does this diversity exist, and how can it be overcome so that SRE teams stop duplicating their development efforts?

This talk presents a solution to both puzzles using the ten-year history of a single SRE tool. The tool is used only internally at a large company. It is one of the rare tools there that has been adopted widely by our very large SRE organization.

Richard Bondi, Google

Richard Bondi has been an engineer at Google since 2011, specializing in the entire web stack and working on travel applications. In 2016 he converted to SRE, and then joined the SRE tech writer team. Before Google, and after leaving his political philosophy PhD program to join the first of many Internet startups, he published a book on cryptography with Wiley of which Bruce Schneier wrote: "This is essential reading for anyone who wants to understands the Microsoft CryptoAPI..."

@conference {229583,
author = {Richard Bondi},
title = {The Curse of {SRE} Autonomy and How to Manage It},
year = {2019},
address = {Brooklyn, NY},
publisher = {{USENIX} Association},