Problem Management

Overview

Problem Management is the second level of Incident Management. While Incident Management focuses on restoring service as fast as possible, Problem Management identifies and eliminates the underlying root cause to prevent future incidents.

Incident vs. Problem vs. Known Error

Term	Definition
Incident	Unplanned service interruption; cause may be unknown; service is restored as a priority
Problem	One or more incidents with an unknown root cause; investigated by experts
Known Error	A problem whose root cause is known and for which a workaround or fix exists

If no solution can be found in First Level Support, the ticket is escalated: the incident becomes a Problem managed by Second Level Support.

The Three Activities of Problem Management

1. Problem Control

All problems are systematically analysed and documented. The goal is to turn unknown causes into Known Errors.

Steps:

Record the problem and compare with the Known Error Database
If a workaround/solution already exists => Known Error, update occurrence counter
Classify the problem (category, sub-category, priority, business impact)
Analyse root cause (see analysis methods)
Record result as a new Known Error in the KEDB

2. Error Control

Once a Known Error exists, Error Control manages the path from workaround to permanent fix.

Workaround provided immediately to restore service
Permanent fix initiated via RFC
After change implementation, Problem Management receives confirmation via a Post Implementation Review (PIR)
First Level Support is informed so they can update the customer

3. Proactive Problem Management

Preventing incidents before they occur:

Analyse frequently recurring Known Errors (high occurrence counter = candidate for proactive PM)
Evaluate manufacturer hints about upcoming software/hardware issues
Monitor automated warnings and exceptions

Workaround

A workaround is a problem bypass, alternative, or interim solution ("working around" the issue) to quickly restore service provisionally while the root cause is being addressed.

Important: Workarounds must be clearly marked as temporary measures in the system, so the provisional fix does not become a permanent state.

Examples:

Disruption	Workaround
Integrated webcam defective	Connect USB camera
Mobile data capture device defective	Use a loan device
Wired network port defective	Use WLAN stick or LAN adapter
DVI monitor port defective	Use DisplayPort or HDMI if available
Laser printer won't start	Disconnect from power and restart
Browser shows blank page	Clear browser cache or use a different browser

Known Error Database (KEDB)

The KEDB stores all known problems with their workaround or solution. First Level Support uses it to provide quick help without escalating to Second Level.

Each entry has an occurrence counter (Vorfallszähler) that tracks how often the problem recurs. A high counter indicates a candidate for Proactive Problem Management.

Key KPIs

KPI	Meaning
Number of New Problems	Total problems registered in a period; Proactive PM aims to minimise this by resolving errors before they turn into incidents
Number of Incidents per Known Problem	Average number of incidents associated with the same problem; shows how widespread the impact is and identifies candidates for Proactive PM
Problem Resolution Effort	Average work effort to resolve a problem, broken down by category; shows which categories require the most effort

Separation of Problem Localisation and Problem Resolution

Problem Management localises the root cause; Change Management resolves it. This separation:

Allows focusing on one task at a time
Enables service restoration (workaround) before the root cause investigation is complete
Does not necessarily involve different teams, but separates the process steps

Overview​

Incident vs. Problem vs. Known Error​

The Three Activities of Problem Management​

1. Problem Control​

2. Error Control​

3. Proactive Problem Management​

Workaround​

Known Error Database (KEDB)​

Key KPIs​

Separation of Problem Localisation and Problem Resolution​