SRE Your gRPC—Building Reliable Distributed Systems (Illustrated with gRPC)

Wednesday, May 24, 2017 - 2:00pm2:55pm

Grainne Sheerin and Gabe Krabbe, Google

Abstract: 

Distributed systems have sharp edges, and we have a wealth of experience cutting ourselves on them. We want to share our experience with SREs elsewhere, so they can skip making the same mistakes and join us making exciting new ones instead!

We will share practical suggestions from 14 years of failing gracefully:

  • In a distributed service, every component is a frontend to another one down the stack. How can it deal with backend failures so that the service as a whole does not go down?
  • In a distributed service, every component is a backend for another one up the stack. How can it be scaled and managed, avoiding overload and under-use?
  • In a distributed service, latency is often the biggest uncertainty. How can it be kept predictable?
  • In a distributed service, availability, processing, and latency costs contributions are hard to assign. When things (inevitably) go wrong, what components are to blame? When they work, where are the biggest opportunities for improvement?

We will cover best and worst practices, using specific gRPC examples for illustration.

Grainne Sheerin, Google

Grainne is a Site Reliability Engineer for Google Ireland. She's a tech lead responsible for Ad Serving infrastructure and has 5 years of experience in production engineering. She a physicist, earning a doctorate in Nanoscience from Dublin City University. Prior to Google, she masqueraded as a strategic relationship manager for Reuters and a network engineer for HEAnet.

Gabe Krabbe, Google

Gabe Krabbe has been a Site Reliability Engineer at Google for over 12 years. He has worked on, and sometimes against, multiple generations of the Ads management and serving infrastructure. Before joining Google, he worked for various companies as a system administrator. He frequently tells his servers and his children that he doesn't care who started it, because it takes two to fight.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@conference {202789,
author = {Grainne Sheerin and Gabe Krabbe},
title = {{SRE} Your {gRPC{\textemdash}Building} Reliable Distributed Systems (Illustrated with {gRPC})},
year = {2017},
publisher = {USENIX Association},
month = may
}

Presentation Video 

Presentation Audio