Statistics for Engineers

Thursday, 2019, October 3 - 14:0017:30

Heinrich Hartmann, Circonus


Gathering all kinds of telemetry data is key to operating reliable distributed systems at scale. Once you have set-up your monitoring systems and recorded all relevant data, the challenge becomes to make sense of it and extract valuable information, like:

  • Are we fulfilling our SLO/SLA?
  • How did our query response times change with the last update?
  • When will I run out of disk space, when we continue to grow like this?

Statistics is the art of extracting information from data. In this tutorial, we address the basic statistical knowledge that helps you at your daily work as a system operator. From the mathematical side, we will cover probabilistic models, summarising distributions with mean values, quantiles, and histograms and their relations. From the technological side, we will discuss metrics vs. event data, the effects of sub-sampling, how not to aggregate percentiles, t-digest and histogram summaries.

The tutorial will be tool agnostic, but tailored towards applications. In the computational examples we will be using Python and data from our production systems. At the end of the workshop attendees should have a clear picture of the mathematical features they need from their monitoring tools, for their application at hand.

