Danny Chen, Bloomberg LP
For many systems, a fully distributed microservices architecture that scales perfectly horizontally remains an unrealized goal. Many systems still co-locate processes on hosts because of performance considerations and/or a local on-host state must be shared. In our department at Bloomberg LP, architectural and performance constraints force us to run on bare metal hardware with many cores and terabytes of main memory.
Operating systems for bare metal hardware have developed and grown in order to enable greater scale (e.g., large numbers of processes/threads, large numbers of open files, etc). But the OS doesn't always scale correspondingly for runtime scale (i.e., large levels of concurrency/contention). Furthermore, the systems we manage don't always scale with newer, larger, and faster hardware.
In this talk, we will present some case studies across a variety of operating systems that illustrate how we run into scale limits in the OS and how we used micro-benchmarks to collect insights into the nature of these scale limits in order to develop fixes and workarounds. These micro-benchmarks also complement the wonderful new tracing facilities in modern operating systems by eliminating "noise" and focusing data collection on kernel "hot spots."
Danny Chen, Bloomberg LP
Danny Chen has been involved in UNIX performance engineering for over 40 years. He's worked on the UNIX SVR3 and SVR4 kernels, market data, messaging and transactional systems, and enterprise systems monitoring. Most recently, he has been applying performance ENGINEERING (not art) principles to his SRE responsibilities as a member of the Trading Solutions SRE team at Bloomberg LP.
SREcon21 Open Access Sponsored by Indeed
author = {Danny Chen},
title = {Latency Distributions and {Micro-Benchmarking} to Identify and Characterize Kernel Hotspots},
year = {2021},
publisher = {USENIX Association},
month = oct
}