Debugging Linux Issues with eBPF

Tuesday, October 30, 2018 - 2:00 pm2:30 pm

Ivan Babrou, Cloudflare


This is a technical dive into how we used eBPF to solve real-world issues uncovered during an innocent OS upgrade. We'll see how we debugged 10x CPU increase in Kafka after Debian upgrade and what lessons we learned. We'll get from high-level effects like increased CPU to flamegraphs showing us where the problem lies to tracing timers and functions calls in the Linux kernel.

The focus is on tools what operational engineers can use to debug performance issues in production. This particular issue happened at Cloudflare on a Kafka cluster doing 100Gbps of ingress and many multiple of that egress.

This is also an introductory talk to a training on ebpf_exporter by Alexander Huyhn.

Ivan is a Performance Engineer at Cloudflare. He spends his days finding performance bottlenecks, fixing them and making sure large chunk of internet runs as fast and as efficiently as possible.

