Fixing On-Call When Nobody Thinks It's (Too) Broken

Monday, March 25, 2019 - 11:05 am11:35 am

Tony Lykke, Hudson River Trading

Abstract: 

What's a team to do when they receive more than 30 pages a day, every day, for almost a decade? Deny there's a problem of course! Join me as we relive the data-informed journey from around 70,000 pages over 7 years (~200/week) to under 50/week in just a few short months in a way that shows those carrying the pager improvement is possible and empowers them to continue questioning and improving the status quo moving forward. We'll look at not only the technical challenges but also non-technical challenges like getting buy-in when nobody thinks there's a problem and managing risk when the on-call team is concerned about silencing legitimate pages along with the noise.

Tony Lykke, Hudson River Trading

Tony is an SRE on the trade systems team at Hudson River Trading based in NYC, where he gets to tackle hard (often not just technically) automation problems and tech debt cleanup projects across a variety of environments. He is obsessively anti-toil, and regularly refuses to accept "that's just the way it is" as an answer.

SREcon19 Americas Open Access Videos Sponsored by
Salesforce

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@conference {229509,
author = {Tony Lykke},
title = {Fixing On-Call When Nobody Thinks It{\textquoteright}s (Too) Broken},
year = {2019},
address = {Brooklyn, NY},
publisher = {{USENIX} Association},
month = mar,
}

Presentation Video