A Case Study in Chaos Testing: Uncovering Kernel Scaling Issues

Wednesday, 26 October, 2022 - 16:0016:40 CEST

Chaos testing has become a popular approach to conducting reliability experiments on distributed systems, but can it be used to discover issues as low level as the kernel? In this talk we present a case study using chaos testing to uncover scaling issues from an unexpected source. You will learn about wide ranging investigative techniques from the cluster to the node level.

Gary Liku started his career in 2017 as a software engineer in the hedge fund space. After building experience with both applications and systems he joined the Bloomberg team as an SRE working on the large scale distributed systems that make up a trading system. He also currently leads the chaos testing initiative for the Trading Systems SRE team.

