When Trouble Comes to Town

Thursday, 31 August, 2017 - 11:4512:30

Michael Gorven, Facebook


One's inclination when tackling an incident is usually to dive to the bottom of the stack where the problem is occurring and start debugging the root cause. However, it's important to first take a step back and approach the incident at a high level to ensure the fastest and most efficient resolution possible. This talk proposes seven steps to consider when tackling an incident: assessing the impact; communicating internally; looking for what changed; trying to mitigate; investigating the root cause; confirming resolution; and documenting and following up. It also touches on various tools which help with these steps.

Michael Gorven, Facebook

Michael Gorven is a Production Engineer at Facebook, where he works on the Web Foundation team and previously Instagram. He fixes things when they break, improves the reliability of the system, helps engineer it to scale, and reverts diffs. Previously he was an early employee at South African startup Nimbula. Michael grew up in Durban and holds a BSc in Electrical and Computer Engineering from the University of Cape Town. He currently lives in London with his wife and two young children after spending five years in California.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

@conference {205524,
author = {Michael Gorven},
title = {When Trouble Comes to Town},
year = {2017},
address = {Dublin},
publisher = {USENIX Association},
month = aug

Presentation Video 

Presentation Audio