Reliability When Everything Is a Platform: Why You Need to SRE Your Customers

Tuesday, March 14, 2017 - 5:15pm6:10pm

Dave Rensin, Google

Abstract: 

The general trend in software over the last several years is to give every system an API and turn every product into a platform. When these systems only served end users, their reliability depended solely on how well we did our jobs as SREs. Increasingly, however, our customers' perceptions of our reliability are being driven by the quality of the software they bring to our platforms. The normal boundaries between our platforms and our customers are being blurred and it's getting harder to deliver a consistent end user reliability experience.

In this talk we'll discuss a provocative idea—that as SREs we should take joint operational responsibility and go on-call for the systems our customers build on our platforms. We'll discuss the specific technical and operational challenges in this approach and the results of an experiment we're running at Google to address this need.

Finally, we'll try to take a glimpse into the future and see what these changes mean for the future of SRE as a discipline.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

Presentation Video

Download Video

Presentation Audio

BibTeX
@conference {202297,
author = {Dave Rensin},
title = {Reliability When Everything Is a Platform: Why You Need to {SRE} Your Customers},
year = {2017},
address = {San Francisco, CA},
publisher = {{USENIX} Association},
}