SRE Your gRPC—Building Reliable Distributed Systems (Illustrated with GRPC)

Wednesday, 30 August 2017 - 11:00am12:00pm

Grainne Sheerin and Gabe Krabbe, Google

Abstract: 

Distributed systems have sharp edges, and we have a wealth of experience cutting ourselves on them. We want to share our experience with SREs elsewhere, so they can skip making the same mistakes and join us making exciting new ones instead!

We will share practical suggestions from 14 years of failing gracefully:

  • In a distributed service, every component is a frontend to another one down the stack. How can it deal with backend failures so that the service as a whole does not go down?  
  • In a distributed service, every component is a backend for another one up the stack. How can it be scaled and managed, avoiding overload and under-use?  
  • In a distributed service, latency is often the biggest uncertainty. How can it be kept predictable?  
  • In a distributed service, availability, processing, and latency costs contributions are hard to assign. When things (inevitably) go wrong, what components are to blame? When they work, where are the biggest opportunities for improvement?

We will cover best and worst practices, using specific gRPC examples for illustration.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

Presentation Audio

BibTeX
@conference {205458,
author = {Grainne Sheerin and Gabe Krabbe},
title = {{SRE} Your gRPC{\textemdash}Building Reliable Distributed Systems (Illustrated with {GRPC})},
year = {2017},
address = {Dublin},
publisher = {{USENIX} Association},
}