Against On-Call: A Polemic

Wednesday, 29 August, 2018 - 16:4517:30

Niall Murphy, Microsoft


There have been computer emergencies as long as there have been computers and emergencies. There is evidence dating computer on-call shifts from the 1940s, and we can trace on-call activity in an essentially unbroken line from today back through seven decades of the computer industry.

But just because we've been doing it for seven decades doesn't make it right. In fact, if you think about it, the fact we've been doing something essentially unchanged for seven decades, given everything else that has changed around the practice, might cause us to ask: is it right that we are doing this? Is there something better? What are the alternatives?

This talks builds on previous work to analyse and establish what the _real_ reasons for continuing to do on-call are, provides evidence that human beings are actually really bad at it—in fact, it's really harmful for them to do it—and proposes solutions, ending with a call for action for the industry to bring into being an on-call-free future.

(Note: this talk builds on an article written for DNB's "Seeking SRE.")

Niall Richard Murphy is Director of Engineering for Microsoft Azure Ireland, where his group works on cloud engineering systems and SRE. He is the instigator, co-author, and co-editor of two books on SRE, and a history of the Irish Internet. He is the holder of degrees in Computer Science and Mathematics, and Poetry Studies. He lives in Dublin with his wife and two children.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

