Incident Management for IT

Due to the evolving Coronavirus/COVID-19 situation, SREcon20 Americas West has been rescheduled to June 2–4, 2020.
More information is available here.

Tuesday, March 24, 2020 - 2:00 pm5:30 pm

Chris Hawley, Rob Schnepp, and Ron Vidal, Blackrock 3

Abstract: 

Many companies have experienced the fear, pain, and embarrassment of handling a technology failure so significant it shook the core of the business both at the time and into the future. Without a standardized way to organize the people responding to incidents and solving technical problems, the time to restore services gets longer and longer.

This session dives into the nuts and bolts of the Incident Management System, which has a long history in the fire and emergency services. We have translated this system to be optimized for the IT world and it is in use by a number of Site Reliability teams and other operational IT response organizations. Effective use of the Incident Management System can provide substantial reduction to the MTTR and the Mean Time To Assemble (MTTA). Companies that employ incident management see a reduction of their MTTR by 35%–65% and it is not unusual to see MTTA reductions of greater than 90%. Incident management uses the fire department model of getting the right people to the problem as rapidly as possible.

Chris Hawley, Blackrock 3

The Blackrock 3 speakers have deep global experience in Incident Management (Fire Department, Special Operations), Anti-Terrorism Operations, and Critical Infrastructure (fiber networks, data centers, oil and gas, power systems). We combine a unique mix of expertise and ingenuity to maximize IT Uptime in your organization.

BibTeX
@conference {247273,
author = {Chris Hawley and Rob Schnepp and Ron Vidal},
title = {Incident Management for {IT}},
year = {2020},
address = {Santa Clara, CA},
publisher = {{USENIX} Association},
month = mar,
}