Heinrich Hartmann, Circonus
Gathering all kinds of telemetry data is key to operating reliable distributed systems at scale. Once you have set-up your monitoring systems and recorded all relevant data, the challenge becomes to make sense of it and extract valuable information, like:
- Are we fulfilling our SLO/SLA?
- How did our query response times change with the last update?
- When will I run out of disk space, when we continue to grow like this?
Statistics is the art of extracting information from data. In this tutorial, we address the basic statistical knowledge that helps you at your daily work as a system operator. From the mathematical side, we will cover probabilistic models, summarising distributions with mean values, quantiles, and histograms and their relations. From the technological side, we will discuss metrics vs. event data, the effects of sub-sampling, how not to aggregate percentiles, t-digest and histogram summaries.
The tutorial will be tool agnostic, but tailored towards applications. In the computational examples we will be using Python and data from our production systems. At the end of the workshop attendees should have a clear picture of the mathematical features they need from their monitoring tools, for their application at hand.
Heinrich Hartmann is the Analytics Lead at Circonus. He is driving the development of analytics methods that transform monitoring data into actionable information as part of the Circonus monitoring platform. In his prior life, Heinrich pursued an academic career as a mathematician. Later he transitioned into computer science and worked as a consultant for a number of different companies and research institutions.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.