Why SREs can't afford to NOT do Chaos Engineering

Tuesday, December 08, 2020 - 10:45 am11:25 am

Mikolaj Pawlikowski, Bloomberg

Abstract: 

Chaos Engineering is steadily transforming from a gimmick to a serious, scientific discipline focused on observing and measuring the effects of the failure in systems of all shapes and sizes, in order to verify their behavior experimentally.

Unfortunately the Internet is still full of slogans like "breaking things in production," which—while well-intentioned—can be harmful to the understanding of what Chaos Engineering is really about. In this talk, I'd like to argue that adopting Chaos Engineering can prove to be a very good investment, regardless of the nature of the system in question.

To do that, I'm going to cover three case studies: a single process, a JVM application, and a set of microservices running on Kubernetes.

Mikolaj Pawlikowski, Bloomberg

Mikolaj Pawlikowski is a Software Engineering Team Leader at Bloomberg and author of Chaos Engineering: Crash test your applications. You might also know him from for his Kubernetes tools PowerfulSeal and Goldpinger.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {262245,
author = {Mikolaj Pawlikowski},
title = {Why {SREs} can{\textquoteright}t afford to {NOT} do Chaos Engineering},
booktitle = {SREcon20 Americas (SREcon20 Americas)},
year = {2020},
url = {https://www.usenix.org/conference/srecon20americas/presentation/pawlikowski},
publisher = {USENIX Association},
month = dec
}

Presentation Video