Why Does (My) Monitoring Suck?

Wednesday, June 12, 2019 - 4:00 pm5:00 pm

Todd Palino, LinkedIn

Abstract: 

What do you do when your infrastructure systems have evolved, but the means of watching them has been stagnant? The struggle between uptime and sleep is real, and we need to make sure that monitoring is effective without drowning in a sea of non-actionable alerts. The path to success is to instrument everything, but only monitor what truly matters.

Todd Palino, LinkedIn

Todd Palino is a Senior Staff Engineer in Site Reliability at LinkedIn on the Capacity Engineering team, where his team is creating a framework for application capacity measurement, analysis, and change intelligence. Prior to that, he was responsible for architecture, day-to-day operations, and tools development for one of the largest Apache Kafka deployments. In his spare time, Todd is the developer of the open source project Burrow, a Kafka consumer monitoring tool, and is the co-author of Kafka: The Definitive Guide, now available from O’Reilly Media.

Out of the office, you can find Todd at conferences like SREcon and LISA, sharing his experience from years in SRE technical leadership, and at Kafka Summit or ApacheCon talking about how to feed and water Kafka infrastructures. Or maybe out on the trails, training for the next marathon.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@conference {233281,
author = {Todd Palino},
title = {Why Does (My) Monitoring Suck?},
year = {2019},
address = {Singapore},
publisher = {USENIX Association},
month = jun
}

Presentation Video