Aravindh Sampathkumar, Booking.com
Ever grapple with systems exhibiting perplexing slowdowns or hitting unseen capacity ceilings under load? This talk empowers SREs to move beyond reactive troubleshooting with a practical, analytical framework for understanding and predicting system performance.
We'll dive into the fundamentals of Queueing Theory, learning how concepts like arrival rates, service times, and the impactful "hockey stick" utilisation curve can help you precisely diagnose delays and comprehend system behaviour under stress. Then, harness the Universal Scalability Law (USL) to quantitatively predict scalability limits, uncovering critical contention and coherency bottlenecks.
Through a guided example using a sample service, you'll see these powerful theoretical models applied in practice, demonstrating how to collect relevant metrics, interpret performance insights, and drive informed architectural and capacity planning decisions. Move beyond common analysis pitfalls and equip yourself with the toolkit to shift from reactive firefighting to proactive, data-driven performance engineering.

Currently a Site Reliability Engineer at Booking.com, my 17-year career has spanned diverse environments, from operating Mainframes, building HPC clusters, optimising enterprise storage for peak throughput and latency, and of course skilfully engineering yaml files for Kubernetes. This deep understanding of how systems behave under load, from the bare metal to Kubernetes, underpins my passion for applying analytical models to predict and proactively manage performance.

author = {Aravindh Sampathkumar},
title = {The {SRE{\textquoteright}s} Crystal Ball: Predicting System Performance with Queues and {USL}},
year = {2025},
address = {Dublin},
publisher = {USENIX Association},
month = oct
}
