All too often, technical teams spend so much time firefighting that they can’t stop to identify and eliminate the problems—the underlying causes—of incidents. Incident resolution is about taking care of the customer—restoring a service to normal levels of operation ASAP. Without a process in place to turn the problem into a known error, the root causes of the incident remain, resulting in recurrences of the incident.
The goals of the Problem Management Process are to prevent repeat incidents and to minimize the impact of incidents and problems that cannot be prevented. Most technical people already have experience in root cause analysis and problem resolution. This tutorial will help them to be measurably more consistent, mature and effective in their practices. Using IT Infrastructure Library (ITIL) best practices, this tutorial will deliver step-by-step instructions on building and managing a problem process.
Technical people and managers responsible for the support of live production services. This is an operational support process that can be put in place from the bottom up. The more teams involved in the process—DBAs, system administrators, developers, helpdesk—the greater the scope of problems that can be addressed.
- a step-by-step guide for building and implementing a problem process and the reasons behind each step
- a process template with examples that can be easily adapted to fit your organization’s current and future needs
- instructions on setting up a Known Error Database and communicating work arounds with impacted support teams
- guidance for getting buy-in from peers and managers
- a complete kit for starting to use After Action Reviews to handle the human component of problems
- Incident response vs. problem resolution
- Root cause analysis techniques
- Making decisions that are aligned with business objectives
- Getting buy-in from teammates, colleagues and managers
- Proactive problem management
- After-action reviews