Problem Management

Problem Management is triggered by Incidents caused by service outages exceeding Availability thresholds, and/or Incidents requiring long-term solutions. The Problem record is linked to the source Incident, and any subsequent Incident impacted by the Problem.

  1. Incident Management can trigger the Problem Management process under the following circumstances:
    • Outage to service (not available)
    • Issue cannot be fully addressed without a code change
    • Issue re-occurs on subsequent Incidents
  2. The number of occurrences of a Problem may impact the Availability of the Service and/or influence to the Release Management process.
  3. Availability Management can be impacted by occurrences of Problems causing service outages, reducing the Availability percentage of the service, risking violation of the Service Level Agreement (SLA).
  4. Release Management can be influenced by the number of occurrences of the same Problem. For example, a high-impact application defect causing revenue loss should be prioritized in the release plan higher than formatting enhancements.
  5. The existing Problem record is linked to the Incident if one exists, otherwise, a new Problem record is created and linked to the Incident.
  6. If a workaround is available, it is implemented during Incident Management (1).
  7. Configuration Management informs on what Configuration Items (CIs) may be related to or impacted by the Problem, and which individuals and teams need to be engaged to remediate the Problem.
  8. A Problem Meeting is held to engage responsible individuals to review and recommend steps to remediate the Problem.
  9. Relevant information is captured in the Problem record, including affected Incidents, timeline, root cause (if known), and workaround.
  10. Knowledge Management is updated with detail of the Problem so that it can be identified as existing when subsequent Incidents (1) for the same Problem are reported, and the prescribed workaround can be applied (6).
  11. Problem Tasks are created for each effort required to remediate the Problem. For example, if the problem is caused by a defect, one task would be to deploy a Release (4) to address the defect. Another task may be to alert the user community to the Problem with workaround instructions.
  12. Once all the Problem Tasks have been closed, confirm the Problem has been fixed.
  13. If closing all the Problem Tasks addresses the Problem, the Problem record can be closed, otherwise, the Problem Management process should be re-initiated with another Problem Meeting (8) to determine why the Problem persists.