SRE Classroom, Or, How to Design a Reliable Distributed System in 3 Hours

Wednesday, 2 October, 2019 - 14:0017:30

Alex Perry, Google LLC, and Andrew Suffield, Goldman Sachs

Abstract: 

This workshop ties together academic and practical aspects of systems engineering, with an emphasis on applying principles of systems design to a production service. We will analyze the service to quantify its performance, and iteratively improve the design.

Participants will work together in small groups to sketch out the design, identify components and their relationships, and to assess the suitability of the design to the system’s Service Level Objective (SLO). Participants will have a system design and bill of materials at the conclusion of this workshop.

Participants will not need laptops or specific coding experience; participants will need enthusiasm for collaborating in small groups, and for discussion-based problem-solving. Participants will come away with an understanding of the principles of iterative systems engineering, popularly known as “Non-abstract large systems design”.

This workshop covers material critical for SRE, an increasingly-broad field that combines software engineering and systems design.

Alex Perry, Google LLC

Alex Perry is a Staff SRE in Los Angeles for the last 13 years at Google. He has worked on many layers of network infrastructure, from fabrics to beyond corp services, as well as social and other applications. Recently, he's working on migrating internal enterprise systems from existing virtualization infrastructure onto Google Cloud Platform. His interests are reliability, relevant monitoring, and disaster preparedness.

Andrew Suffield, Goldman Sachs

Andrew Suffield is an SRE at Goldman Sachs in London. They tend to focus on production automation, distributed systems design, and teaching.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@conference {239440,
author = {Alex Perry and Andrew Suffield},
title = {{SRE} Classroom, Or, How to Design a Reliable Distributed System in 3 Hours},
year = {2019},
address = {Dublin},
publisher = {USENIX Association},
month = oct
}