Fault Tree Analysis Applied to Apache Kafka

Wednesday, March 27, 2019 - 11:10 am11:40 am

Andrey Falko, Lyft

Abstract: 

At last year's SREcon, we were inspired by talks that introduced fault tree analysis. We decided to apply the technique to bulletproof our Apache Kafka deployments. In this talk, learn about fault tree analysis and what you should focus on to make your Apache Kafka clusters resilient.

Andrey Falko, Lyft

Andrey Falko is one of the first Reliability Software Engineers at hired at Lyft, where he has been for seven months. He is currently focused on building and scaling reliable PubSub systems for Lyft's Data Platform. Prior to Lyft, Andrey worked at Salesforce for nine years where he researched Kafka and Pulsar performance and reliability. While there, he also built an IaaS system, many CI/CD systems, a Zipkin service, and features for the Salesforce platform.

SREcon19 Americas Open Access Videos Sponsored by
Salesforce

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@conference {229523,
author = {Andrey Falko},
title = {Fault Tree Analysis Applied to Apache Kafka},
year = {2019},
address = {Brooklyn, NY},
publisher = {USENIX Association},
month = mar
}

Presentation Video