Building Service Ownership Using Documentation, Telemetry, and a Chance to Make Things Better

Monday, December 07, 2020 - 1:50 pm2:30 pm

Daniel "Spoons" Spoonhower, Lightstep


Adopting Kubernetes, deploying a service mesh, or breaking up a monolith are all ways of building distributed software systems, but if we are going to build and operate software at scale, we need to think about how to build scalable and distributed people systems too.

In this talk, I'll cover a journey from a monolithic team (and a small set of collectively owned services) to a set of teams and many more services. I'll talk about how to use documentation, divide oncall responsibilities, and set clear objectives, as well as when to ask humans to drive and maintain the process (be it system documentation or alert runbooks) and when to depend on automated processes that use telemetry from the application itself.

Successfully building distributed ownership requires not just defining how we are going to hold teams accountable, but also giving those teams agency to make things better. That agency is often overlooked but is critical to success.

Daniel "Spoons" Spoonhower, Lightstep

Daniel "Spoons" Spoonhower is CTO and a co-founder at Lightstep. He is an author of Distributed Tracing in Practice (O'Reilly Media, 2020). Previously, Spoons spent almost six years at Google as part of Google's infrastructure and Cloud Platform teams. He has published papers on the performance of parallel programs, garbage collection, and real-time programming. He has a Ph.D. in programming languages from Carnegie Mellon University but still hasn't found one he loves.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

@inproceedings {262235,
author = {Daniel "Spoons" Spoonhower},
title = {Building Service Ownership Using Documentation, Telemetry, and a Chance to Make Things Better},
booktitle = {SREcon20 Americas (SREcon20 Americas)},
year = {2020},
url = {},
publisher = {USENIX Association},
month = dec

Presentation Video