All too often, technical teams spend so much time firefighting that they can’t stop to identify and eliminate the problems—the underlying causes—of incidents. Incident resolution is about taking care of the customer—restoring a service to normal levels of operation ASAP. Without a process in place to turn the problem into a known error, the root causes of the incident remain, resulting in reoccurrences of the incident.
The goals of the Problem Management Process are to prevent reoccurrence of incidents, prevent problems and resulting incidents from happening, and minimize the impact of incidents and problems that cannot be prevented. Most technical people already have experience in root cause analysis and problem resolution. This tutorial will help them be measurably more consistent, mature and effective in their practices. Using IT Infrastructure Library (ITIL) best practices, this tutorial will deliver step-by-step instructions on building and managing a problem process. I am a certified ITIL Expert. I designed, implemented and then managed a problem process for four years at a registry and DNS service provider with complex technologies across international datacenters.
Technical people and managers responsible for the support of live production services. This is an operational support process that can be put in place from the bottom up. The more teams involved in the process—DBAs, system administrators, developers, helpdesk—the greater the scope of problems that can be addressed.
- A step-by-step guide for building and implementing a problem process and the reasons behind each step
- A process template with examples that can be easily adapted to fit your organization’s current and future needs
- Instructions on setting up a Known Error Database and communicating work arounds with impacted support teams
- Guidance for getting buy-in from peers and managers
- Incident response vs problem resolution
- Root cause analysis techniques
- Making decisions that are aligned with business objectives
- Getting buy-in from teammates, colleagues and managers
- Proactive problem management
- After-action reviews as a tool
- “Root cause” vs. multiple causes