Skip to main content

Problem Management

Overview

Problem Management is the second level of Incident Management. While Incident Management focuses on restoring service as fast as possible, Problem Management identifies and eliminates the underlying root cause to prevent future incidents.

Incident vs. Problem vs. Known Error

TermDefinition
IncidentUnplanned service interruption; cause may be unknown; service is restored as a priority
ProblemOne or more incidents with an unknown root cause; investigated by experts
Known ErrorA problem whose root cause is known and for which a workaround or fix exists

If no solution can be found in First Level Support, the ticket is escalated: the incident becomes a Problem managed by Second Level Support.

The Three Activities of Problem Management

1. Problem Control

All problems are systematically analysed and documented. The goal is to turn unknown causes into Known Errors.

Steps:

  1. Record the problem and compare with the Known Error Database
  2. If a workaround/solution already exists => Known Error, update occurrence counter
  3. Classify the problem (category, sub-category, priority, business impact)
  4. Analyse root cause (see analysis methods)
  5. Record result as a new Known Error in the KEDB

2. Error Control

Once a Known Error exists, Error Control manages the path from workaround to permanent fix.

  • Workaround provided immediately to restore service
  • Permanent fix initiated via RFC
  • After change implementation, Problem Management receives confirmation via a Post Implementation Review (PIR)
  • First Level Support is informed so they can update the customer

3. Proactive Problem Management

Preventing incidents before they occur:

  • Analyse frequently recurring Known Errors (high occurrence counter = candidate for proactive PM)
  • Evaluate manufacturer hints about upcoming software/hardware issues
  • Monitor automated warnings and exceptions

Workaround

A workaround is a problem bypass, alternative, or interim solution ("working around" the issue) to quickly restore service provisionally while the root cause is being addressed.

Important: Workarounds must be clearly marked as temporary measures in the system, so the provisional fix does not become a permanent state.

Examples:

DisruptionWorkaround
Integrated webcam defectiveConnect USB camera
Mobile data capture device defectiveUse a loan device
Wired network port defectiveUse WLAN stick or LAN adapter
DVI monitor port defectiveUse DisplayPort or HDMI if available
Laser printer won't startDisconnect from power and restart
Browser shows blank pageClear browser cache or use a different browser

Known Error Database (KEDB)

The KEDB stores all known problems with their workaround or solution. First Level Support uses it to provide quick help without escalating to Second Level.

Each entry has an occurrence counter (Vorfallszähler) that tracks how often the problem recurs. A high counter indicates a candidate for Proactive Problem Management.

Key KPIs

KPIMeaning
Number of New ProblemsTotal problems registered in a period; Proactive PM aims to minimise this by resolving errors before they turn into incidents
Number of Incidents per Known ProblemAverage number of incidents associated with the same problem; shows how widespread the impact is and identifies candidates for Proactive PM
Problem Resolution EffortAverage work effort to resolve a problem, broken down by category; shows which categories require the most effort

Separation of Problem Localisation and Problem Resolution

Problem Management localises the root cause; Change Management resolves it. This separation:

  • Allows focusing on one task at a time
  • Enables service restoration (workaround) before the root cause investigation is complete
  • Does not necessarily involve different teams, but separates the process steps