Principles of Chaos Engineering

Tuesday, March 14, 2017 - 11:55am12:50pm

Casey Rosenthal, Netflix

Abstract: 

Distributed systems create threats to resilience that are not addressed by classical approaches to development and testing. We’ve passed the point where individual humans can reasonably navigate these systems at scale. As we embrace a world that emphasizes automation and engineering over architecting, we left gaps open in our understanding of complex systems. 

Chaos Engineering is a new discipline within Software Engineering, building confidence in the behavior of distributed systems at scale. SREs and dedicated practitioners adopt Chaos Engineering as a practical tool for improving resiliency. An explicit, empirical approach provides a formal framework for adopting, implementing, and measuring the success of a Chaos Engineering program. Additional best practices define an ideal implementation, establishing the gold standard for this nascent discipline. 

Chaos Engineering isn’t the process of creating chaos, but rather surfacing chaos that is inherent in the behavior of these systems at scale. By focusing on high level business metric, we side step understanding *how* a particular model works in order to identify *whether* it work under realistic, turbulent conditions in production. This fills a gap that arms SREs with a better, holistic understanding of the system’s behavior.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

Presentation Video

Download Video

Presentation Audio

BibTeX
@conference {201834,
author = {Casey Rosenthal},
title = {Principles of Chaos Engineering},
year = {2017},
address = {San Francisco, CA},
publisher = {{USENIX} Association},
}