Fault Tree Analysis Applied to Apache Kafka

Friday, 2019, October 4 - 16:0016:45

Andrey Falko, Lyft

Abstract: 

This talk should provide a framework for answers the following common questions a Kafka operator or user might have: What should your replication factor be for your Kafka topics? How many partitions should you have? How many consumers should I provision? What should my ISR setting be? Should I use RAID or not?

Andrey Falko, Lyft

Andrey Falko is one of the first Reliability Software Engineers at hired at Lyft, where he has been for more than a year. He is currently focused on building and scaling reliable PubSub systems for Lyft's Data Platform. Prior to Lyft, Andrey worked at Salesforce for nine years where he researched Kafka and Pulsar performance and reliability. While there, he also built an IaaS system, many CI/CD systems, a Zipkin service, and features for the Salesforce platform.

BibTeX
@conference {239553,
author = {Andrey Falko},
title = {Fault Tree Analysis Applied to Apache Kafka},
year = {2019},
address = {Dublin},
publisher = {{USENIX} Association},
month = oct,
}