Practical Instrumentation for Observability

Wednesday, June 12, 2019 - 12:00 pm–12:30 pm

Gabe Krabbe, Google

Previous talks such as "Service Monitoring Manual—2018 Edition" (SREcon18 Asia/Australia) have made the points around asking the right questions, knowing the audience, and the idea of "observability" of a service.

This 20-minute talk intends to fill in some of the gap between "you need good SLIs" and "the code increments a counter": what exactly should be gathered, for which purpose? There will be concrete examples for good data to gather and export, so that Prometheus, Nagios, Opencensus and their friends and relatives provide useful information instead of distracting noise and misleading lies.

Gabe Krabbe has been a Site Reliability Engineer at Google for over 14 years. He has worked on, and sometimes against, multiple generations of the Ads management and serving infrastructure. Before joining Google, he worked for various companies as a system administrator. Gabe frequently tells his servers and his children that he doesn't care who started it because it takes two to fight.

