A Small Business Guide to ITIL Problem Management

A man sitting in front of his laptop at home and making a phone call.

Image source: Getty Images

Many or all of the products here are from our partners that compensate us. It’s how we make money. But our editorial integrity ensures our experts’ opinions aren’t influenced by compensation. Terms may apply to offers listed on this page.
Problem management will streamline your IT processes, decrease service response times, and increase customer satisfaction. The Ascent discusses six processes to help implement problem management.

Every day brings a fresh round of information technology (IT) problems: A server goes down, an employee is locked out of their computer, or a piece of software goes haywire. Help tickets arrive in a steady stream, and it can seem like your IT department will never get ahead of the service request curve.

It need not be that way.

Information Technology Infrastructure Library (ITIL) principles can streamline and standardize many of your IT department activities. We'll go over the benefits of ITIL problem management and different processes to use it.

Overview: What is problem management?

ITIL problem management deals with past, current, and future issues requiring IT service. Proactivity is its primary objective:

  • Prevent problems and their ensuing incidents.
  • Eliminate recurring incidents.
  • Minimize unavoidable incidents.

Problem management is one of the 26 processes included in ITIL Version 3 (V3) service life cycle best practices.

Problem vs. incident management

Don’t confuse problem management with incident management. After all, an incident requiring IT service is a problem, right? The difference between the two relates to scope and strategy:

  • An incident is one event, but a problem is the underlying cause of multiple incidents.
  • Incident management is reactive because it requires a triggering event, while problem management seeks to prevent incidents by addressing underlying issues.
Incidents are single events, and problems are groups of two or more related incidents.

Incident management deals with single events, problem management addresses groups of related incidents. Image source: Author

If you're late to work one day, that's a single incident that may or may not indicate an underlying problem. If you're late to work three days every week, that’s a recurring problem: Leave home earlier, use a different means of transportation, or take an alternate route.

How your business can benefit from problem management

Effective problem management has a direct, positive impact on your business's bottom line. This return on investment (ROI) includes faster service times, lower employee turnover, fewer critical incidents, and increased customer satisfaction.

1. Reduce mean time to repair

Mean time to repair (MTTR) is a key performance indicator (KPI) for your IT department because faster help request turnaround times allow more work to be done. Reducing MTTR can also decrease turnover among your IT staff, who are less likely to suffer burnout if they're not dealing with the same problems repeatedly.

2. Avoid costly incidents

Problem management helps you avoid expensive incidents before they happen. The Y2K calendar bug, for example, had the potential to crash computer systems worldwide. Its actual effects were negligible due to several years of proactive work by thousands of computer programmers.

3. Increase customer satisfaction

Customer satisfaction (CSAT) is another KPI. Everyone gets frustrated when they encounter the same IT problem. Employees are less productive because their performance is compromised, and customers will find another service or product provider if it is too difficult to work with you.

How does the problem management process work?

ITIL V3 process management has six stages:

  1. Detection
  2. Categorization
  3. Prioritization
  4. Analysis
  5. Resolution
  6. Closure
The six ITIL V3 problem management stages are illustrated in a circular flow diagram.

ITIL V3 problem management consists of six sequential stages. Image source: Author

Multiple processes can help you work through these six stages, but no single one is the "best." Be familiar with all of them and choose the most appropriate one for each IT situation you face.

Process 1: The 5 whys

The five whys process is a Six Sigma technique to drill down to the root cause of a problem by asking a series of "why" questions. Five is not an absolute number; you should keep asking "why" as long as it is productive.

For example, the problem might be lower than projected online sales on multiple weekends. Why? The e-commerce platform would not process transactions. Why? A glitch in the payment processor that wasn't fixed until Monday.

Why? IT staff over the weekend did not have the appropriate expertise. Why? Weekend staffing is based on seniority. Why? Senior IT techs want weekends off.

Answering these questions helps formulate countermeasures. Solutions for the problem above -- lower than projected weekend sales due to a lack of IT support -- could include having senior IT techs on call, providing extra training for junior techs, or requiring all techs to work weekends on a rotating schedule.

Process 2: The Kepner-Tregoe matrix

The Kepner-Tregoe (KT) matrix has four components that identify solutions and assess their viability:

  • Situation appraisal: What situation requires resolution, what are the relevant facts, and why should a decision be made?
  • Problem analysis: Gather evidence about the situation's root problem(s).
  • Decision analysis: Identify alternative solutions and assess their risks.
  • Potential problem analysis: Select the best solution and investigate its potential negative consequences and how to mitigate them.

Website sales over weekends were lower than projected in the five whys example because of a lack of IT support. Potential solutions included having senior IT techs on call, providing more training to junior techs, or requiring all techs to work weekends on a rotating basis.

The K-T matrix could produce the same potential solutions for the weekend sales problem but includes an assessment step to identify the "best" one. It also recognizes that even the best solution can produce negative effects that must be identified and addressed.

Process 3: Pareto analysis and the 80/20 rule

Pareto analysis is based on the 80/20 rule: 80% of problems or benefits are created by 20% of key causes. In 2002, for example, Microsoft reported that 80% of crashes and errors in MS Office and Windows were caused by 20% of all reported bugs.

Other real-world 80/20 rule examples include:

  • 80% of customer complaints are caused by 20% of a company's products or services.
  • 80% of profits are generated by 20% of a company's products or services.
  • 80% of operational revenue is generated by 20% of a company's sales team.
  • Customer complaints: 80% originates from 20% of a company’s products or services.
  • Profits: 80% results from 20% of a company’s products or services.
  • Operational revenue: 80% comes from 20% of a company’s sales team.

Pareto analysis is used to identify the most cost-efficient solutions to problems. Microsoft discovered that fixing the top 20% most-reported bugs led to 80% fewer crashes.

In our example of low weekend website sales, evaluating potential solutions means calculating which one produces the best ROI for the lowest cost.

Process 4: Pain value analysis

Pain value analysis is similar to Pareto analysis because it identifies the problems creating the greatest negative impact on the company. Then, more resources can be allocated to solve them, such as creating runbooks to guide the procedures to fix them.

The pain value formula multiplies the number of users affected by the amount of downtime, impact on each user, and the company's incurred financial cost.

Each problem is then assigned a priority level:

  • Immediate/critical
  • High
  • Moderate
  • Low

The typical problem manager has a long list of problems to address and can't afford to get lost in the weeds with low priority ones.

Slow website sales over the weekend due to staffing issues or technical failures is a high or immediate priority problem. You don't want it competing for attention with minor service requests or general inquiries.

Process 5: Brainstorming

Brainstorming is more informal than the first two processes. It has less to do with implementing a solution than identifying a solution to present to stakeholders. This process is most effective when using a cross-functional team: Siloing is avoided, a wider range of ideas is generated, and participation is increased.

Brainstorming steps include:

  • Determine the brainstorming topic.
  • Require input from each participant.
  • List all ideas.
  • Organize ideas after brainstorming is complete.
  • Prepare a plan of action to present to stakeholders.

If website sales are down on the weekend, brainstorming could also produce the potential solutions above. The process to choose the most viable one to implement would depend upon negotiations between stakeholders, including senior management, junior techs, and senior techs.

Process 6: Affinity mapping

Affinity mapping is often used as an extension of brainstorming to help categorize ideas. It organizes large data sets and identifies relationships among information.

Affinity mapping has four steps:

  1. Write each idea during a brainstorming session on a sticky note or card.
  2. Group members simultaneously place related ideas side by side without discussion.
  3. Define categories of ideas through group discussion.
  4. Arrange multiple categories into higher-level groups.

The most effective affinity diagrams have 40-60 items, but up to 200 items are not uncommon. This process helps prevents teams from focusing on one solution too early in the problem management process, at the expense of others that could be viable too.

Going back to the slow weekend sales example, affinity mapping produces more ideas to work with than, for example, the five whys method.

How problem management compares to other ITIL processes

The current version of ITIL V3, released in 2011, is built around the concept of ITIL life cycles, that consisting of 26 processes within five service areas:

  • Service strategy
  • Service design
  • Service transition
  • Service operations
  • Continual service improvement (CSI)
A table breaks out the five ITIL V3 service areas and their 26 total processes.

ITIL V3 problem management is part of the service operations area. Image source: Author

ITIL V4, released in 2019, expanded the scope of ITIL V3 by grouping 34 processes into three areas within the service value system (SVS):

  • General management practices
  • Service management practices
  • Technical management practices

ITIL's original iteration, developed in the 1980s, was an ad hoc collection of IT best practices and checklists. ITIL V4 SVS principles employ a holistic view of IT activities to better integrate each IT department with other company departments and overarching business goals.

Prevent problems through problem management

Worried about implementing problem management on your own? The best IT help desk software supports multiple ITIL processes. Effective problem management provides clear bottom-line evidence to support its continued use.

Alert: highest cash back card we've seen now has 0% intro APR until 2024

If you're using the wrong credit or debit card, it could be costing you serious money. Our experts love this top pick, which features a 0% intro APR until 2024, an insane cash back rate of up to 5%, and all somehow for no annual fee. 

In fact, this card is so good that our experts even use it personally. Click here to read our full review for free and apply in just 2 minutes. 

Read our free review

Our Research Expert