How We Went from Being Astronauts to Being Mission Control: Managing Systems in an Age of Dynamic Complexity

Due to the evolving Coronavirus/COVID-19 situation, SREcon20 Americas West has been rescheduled to June 2–4, 2020.
More information is available here.

Wednesday, March 25, 2020 - 4:30 pm5:20 pm

Laura Nolan

Abstract: 

Why is it that a single server can often have better uptime than a public cloud service?

We used to manage systems. Instead, many of us now write and run dynamic control planes: the systems that run our user-facing systems. We find the dynamic control plane pattern in software-defined networking, in service meshes, in some load balancers, and in job orchestration systems.

This talk looks at the common architectural shapes of dynamic control planes, and some examples of how they fail spectacularly—many major cloud outages are caused by dynamic control plane issues. Why are dynamic control planes so hard to run, and what can we do about it?

Laura Nolan[node:field-speakers-institution]

Laura Nolan is a software engineer whose fascination with failure and fragility in systems drew her into the field of Site Reliability Engineering. She is a contributor to "Site Reliability Engineering: How Google Runs Production Systems" and "Seeking SRE", and writes a quarterly column on SRE for ;login magazine. Laura works for Slack (in Dublin, Ireland).

BibTeX
@conference {247299,
author = {Laura Nolan},
title = {How We Went from Being Astronauts to Being Mission Control: Managing Systems in an Age of Dynamic Complexity},
year = {2020},
address = {Santa Clara, CA},
publisher = {{USENIX} Association},
month = mar,
}