Best Practices for When s*IT Hits the Fan
LISA: Where systems engineering and operations professionals share real-world knowledge about designing, building, and maintaining the critical systems of our interconnected world.
The LISA conference has long served as the annual vendor-neutral meeting place for the wider system administration community. The LISA14 program recognized the overlap and differences between traditional and modern IT operations and engineering, and developed a highly-curated program around 5 key topics: Systems Engineering, Security, Culture, DevOps, and Monitoring/Metrics. The program included 22 half- and full-day training sessions; 10 workshops; and a conference program consisting of 50 invited talks, panels, refereed paper presentations, and mini-tutorials.
Dave Cliffe, PagerDuty
Outages suck; how you handle them shouldn’t. At PagerDuty, we talk to real customers experiencing real outages all the time. Operations escalations and downtime can be handled in many ways:
- During the incident: who to alert when, how to communicate, handling dependency and downstream failures, disclosure
- After the incident: post-mortems, public disclosure, formalizing process vs. investing in automation, preventative actions
There are also ways to keep engineers sane, customers happy, and the $$$ flowing. In this talk, come learn about best practices from across the industry, including how PagerDuty executes during an outage (but trust us, those never happen).
Dave Cliffe, PagerDuty

Dave is an engineer who has adopted a more peaceful role as "sherpa" on the Product team at PagerDuty, a company whose sole goal is to make the lives of DevOps engineers everywhere a calmer, sanity-filled reality. Before PagerDuty, Dave worked in cloud computing at Microsoft on the Windows Azure team. Frequently, he wonders which is scarier: being an on-call engineer responsible for an outage or being a parent. The debate rages on.
LISA16 Open Access Sponsored by Bloomberg
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

author = {Dave Cliffe},
title = {Best Practices for When {s*IT} Hits the Fan},
year = {2014},
address = {Seattle, WA},
publisher = {USENIX Association},
month = nov
}






















