Problem Management is structured to address the
causes of incidents which pose the greatest risk. (Negative risk) Therefore it
focuses on the heavy hitter recurring service affecting events; it doesn’t find
the root cause or permanent fix for every incident. Success is measured in
terms of what has been removed from the environment.
§ How many problems are identified and removed from our IT
environment.
§ Problems which have a status of resolved and closed.
So let’s walk the process, for problem
management.
First an incident occurs. An incident is any
unplanned outcome from the operation of an information system. Incidents
interrupt the IT service which the customer receives. Incidents are normally
reported to the service desk, and an incident record is created.
Next, the incident is assessed. If the cause of
the incident isn’t know, then the incident is escalated to a problem. A problem
is an incident whose cause is not known.
As the problem is reviewed, the cause of the
problem and a workaround maybe determined. As soon as these two aspects occur,
the problem is changed to a known error.
Finally the known error is assessed to
determine if the symptoms of the incident match an already existing problem
record. If so, the new incident is cross-referenced to that problem
However if the known error doesn’t match any
existing symptoms, a net new problem record is created.
The terminology incident, problem, and known
error portray the effect and root causes of unexpected events in an information
system. Identifying the cause of these events and minimizing their impact is
the primary purpose of the problem management process.
The goal of problem management activities is to
ascertain the root causes of incidents and to minimize their impact on the
business operations of a company. This is done through the following processes:
§ Problem control - The
purpose of problem control is to identify problems within an IT environment and
to record information about those problems. Problem control identifies the
configuration items at the root of a problem and provides the service desk with
information on workarounds.
§ Error control - The purpose of error control is to keeps track of known
errors and to determines the resource effort needed to resolve the known error.
Error control monitors and removes known errors when it's feasible and
worthwhile.
§ Proactive problem management - The purpose of proactive problem management is to find
potential problems and errors in an IT infrastructure before they cause
incidents. Stopping incidents before they occur provides improved service to
users.
The primary measure of the success of the
problem management process is how many problems are identified and removed from
an IT infrastructure. Therefore, the primary output from this IT service
management process renders problems that are resolved and closed.
The work of problem management produces the
following outcomes:
§ Records of known errors and available workarounds - These records are kept in the configuration
management database (CMDB), and they provide information to the service desk
and other ITSM processes.
§ Requests for change (RFCs) - RFCs describe changes needed to remove a known error.
Problem management does not approve or perform the change. RFCs are sent to
another ITSM process, change management.
§ Changed records in the CMDB - Information about a known error and any affected CIs is
forwarded to the configuration management process, the IT service management
process that maintains the CMDB.
When the problem management process is used to
identify the root causes of problems, it's far more likely that they will be
diagnosed correctly and fixed properly. As a result, problems are permanently
eliminated.
Problem management includes the following two
types of approaches to address problems:
§ Reactive problem management - Reactive problem management seeks to cure the symptoms of
problems. The reactive approach responds to reports of incidents that have
already occurred. Reactive problem management can be viewed as two activities
§ Problem control activities - The major problem control activities are:
§ Identification and recording - Problem management receives information about reported incidents
from the incident management process and the service desk. Members of the
problem management team analyze this information, looking for similarities in
the symptoms of reported incidents. They look for records of previously
identified problems that can explain the symptoms. If none can be found, a
record describing a new problem is created.
§ Classification - This control activity identifies the importance of new
problems and designates resources for addressing them.
Problems
are classified by category, such as hardware, software, or other types. Then
they can be assigned to the corresponding support personnel. Problems are also
classified by priority ranking. Problems with higher priority rankings are
addressed before problems with lower priority rankings.
Investigation
and diagnosis - Problem management teams look for the root cause of
problems. If the cause is determined, problem management recommends a
workaround or a temporary fix for the problem.
§ Identify cause of problem and devise a workaround - In the automated service management system,
the status of the problem is changed to that of a known error.
When an
IT department applies problem control activities, it prioritizes the problems
that present the biggest threat to the information system or the company's
ability to conduct business. When the root cause of a problem has been found
and a workaround has been devised, problem control activities end. Then the
second group of activities in reactive problem management begins.
§ Error control activities- Now
the problem becomes a known error in the IT infrastructure, and error control
activities begin. Error control activities include:
§ Error identification and recording - This means creating a record that identifies
a known error and all the configuration items (CIs) that cause the error or are
affected by it.
§ Error assessment - This activity prioritizes errors and places them into
groups according to their importance.
§ Error resolution recording - The resolution to a known error may include changes to
hardware or software, user training, or operational procedures. Error control
creates a request for change (RFC) and forwards it to change management. The
RFC is cross-referenced to the known error in the automated service management
system.
§ Error resolution monitoring - Changes are planned and implemented by other IT service
management processes. Problem management monitors the effect of problems on
service provided to users and the progress of requested changes until they're
complete.
§ Error closure —The final error control activity is error closure. When
recommended changes to fix a known error have been completed, the known error
record in the service management system can be closed. Records of incidents and
problems associated with that known error may also be closed.
§ Proactive problem management - Proactive problem management seeks to inoculate IT systems
against problems. The proactive approach identifies potential problems before
they emerge.
§ Trend analysis - This is the process of examining problem and incident
reports to discover what types of problems are happening more frequently. Trend
analysis of existing problems and incidents can reveal where similar problems
may occur in other places within the infrastructure. It can also show that repeated
failures have not been adequately resolved and are likely to continue to
happen.
§ Targeting preventative action - This process applies the same techniques used in reactive
problem management to a select few potential problems with a high degree of business
impact. Targeting preventative action may include creating RFCs, training users
and service desk team members, or recommending procedural changes within the IT
department.
The groups of problem management activities; (problem
control, error control, and proactive problem management) identify and resolve
problems which have the greatest potential impact on a company's business.
The success of problem management depends on
having the right people performing the right actions. Responsibility for
leading the problem management process is assigned to one person designated as
the problem manager. The roles of the problem manager are:
1.
To maintain and develop
problem control activities - It's the problem manager's job to make sure that
information about incidents within the system is being received and reviewed in
a systematic way.
2.
To monitor the
effectiveness of error control activities and make recommendations for
improvement - She must also ensure that relationships among configuration
items are considered in proposed solutions to problems.
3.
To cascade information
about workarounds or fixes to those who need it - Communication with the service desk and
incident management is a key role performed by the problem manager.
4.
To monitor the progress
of problems and known errors toward a final resolution - If solutions aren't implemented as quickly as
necessary, the problem manager may follow procedures to escalate the priority
of the problem.
Each of these four roles contributes to the
ability of problem management to identify and resolve problems and known errors
quickly. The problem manager will also perform typical supervisory roles to
direct the activities of any other problem management team members.
The problem manager's duties should never be
combined with the duties of the service desk supervisor. The priorities of the
service desk and problem management are often incompatible.
The success of problem management also relies
on critical factors before, during, and after the main activities in the
problem management process. The critical factors for success are:
§ Performance targets - It's important to decide how the performance of problem
management will be measured before the process is implemented. If possible, use
statistics from the previous support activities to set goals for problem
management.
§ Periodic audits - Perform periodic audits to determine whether problem
management procedures are being followed. Problems that aren't properly
reported or investigated are more likely to cause interruptions of service to
users or a major impact on the business.
§ Problem reviews - Conduct major problem reviews after problems with high
urgency or impact have been resolved. Look for ways to improve the way problems
are identified and resolved. Problem management procedures should be
continually improved.
Problem management will succeed when an
effective problem manager fills the required roles, and critical factors for
the success of the process are included in everyday operating procedures.
Implementing problem management brings many
benefits to a company and its IT department. However, there are also some
problems and costs that arise during the implementation of problem management.
Among the most common problems companies
experience is a difficulty establishing adequate communication between problem
management and another IT service management process, incident management.
Communication between the two can be difficult because they pursue the
following conflicting goals:
§ Problem management - The goal of problem management is to investigate the root
cause of a problem. The speed with which a solution is found is an important,
but secondary, consideration.
§ Incident management - The goal of incident management is to recover from
incidents and restore service to users as quickly as possible. Determining the
cause of a problem is less important.
Companies also often have difficulty
establishing lines of communication between the software development process
and problem management. Programmers and developers are frequently aware of known
errors in the software they create, but they can be reluctant to identify them.
In many companies, employees resist new
procedures. Many companies report that employees cling to previous informal
problem management methods. It takes time for employees to accept the
discipline of problem management.
Companies should expect to incur some costs
with the implementation of problem management. However, it isn't necessary to
create a vast problem management process that's capable of handling every
single problem that arises. As a result, the incremental costs of problem
management are negligible. The hardware and software tools needed are shared
with other IT service management processes, and the additional personnel costs
are small.
Problems and costs arise frequently during the
introduction of problem management. However, the problems and costs are
manageable and bring worthwhile improvements in the performance of the IT
infrastructure.
Problem management seeks to identify the
underlying causes of incidents in an IT infrastructure and to remove those
causes. The problem management process addresses the causes of incidents
reactively and proactively.