You know the call. We all have received them it
is the break fix call. The call occurs any time during the day. The call that
tell you three main things, someone's unhappy, someone has to handle this, and
isn't there a better way to manage this aspect of support. There is some good
news here for all, ITIL has best practice guidelines for dealing with incident
management.
Look at an incident from a high level, there is
a pattern of actions that can be taken to resolve the incident. All incidents
have inputs, outputs, and management activities like all other processes.
The parts of an incident management process
are:
§ Inputs - Inputs are key to the process. Incident details are received
from the service desk, network, or computer operations. There are many forms of
inputs, break fix issues, service request, and/or automatic monitoring alerts.
§ Outputs - Outputs
need to be considered from the viewpoint of what are the outputs of incident
management. Obviously this would be the closed incident or restored application
availability. But looking at a higher level, there is the user satisfaction,
improved productivity for all, customer follow up and communication, the
documentation for the incident reports and management information.
§ Incident Management Activities - Incident management activities are detection
and reporting; classification and initial support; investigation and diagnosis,
resolution and recovery; incident closure; and incident ownership, monitoring
tracking, and communication.
Just to review the flow of events, most
customer and user incidents are initially reported to the service desk. This
action gives ownership of the handling and tracking of the incident from
beginning to end to the service desk, even though the work maybe completed
coordinating with other departments.
The activities of an incident management
process are:
§ Incident detection and reporting - Incident detection and reporting is the act of
learning an incident has occurred and recording the basic details related to
it.
§ Classification and initial support - Classification and initial support categorizes
the incident, by matching it against the knowledge base of issues, assigning a
priority, assessing if it is related to configuration details, providing
initial support and closing the incident or routing it to a specialist group.
§ Investigation and diagnosis - Investigation and diagnosis relates to assessing incident
details, collecting and analyzing the information and resolution, then routing
the incident to line support
§ Resolution and recovery - Resolution and recovery surrounds the completing of the
incident, using a solution or workaround, or raising a request for change.
§ Incident closure - Incident closure is the act of confirming the resolution with
the reporter of the incident and closing the incident.
§ Incident ownership, monitoring, tracking, and communications - Incident ownership, monitoring, tracking, and
all communications are all the activities that surround monitoring the
incident, escalating it, and informing the user of the latest status, key
accomplishments, and next steps.
Imagine how helpful this would be if it was in
place. Wouldn't anyone work towards this ideal?
To elaborate on this point a little further, as
import as roles and responsibilities are to any effective plan, the tools to
get the job done are just as important. You simply need the right tools to be
able to work effectively.
Tools commonly used in incident management are:
§ Automatic incident logging and alerting - This tool can automatically log incidents and
alert support personnel in the event of fault detection on mainframes,
networks, servers, and possibly through an interface to system management
tools.
§ Automatic escalation facilities - Automatic escalation facilities help facilitate
the timely handling of incidents and service requests. Imagine automatic
notification, instead of constantly checking a worklist.for a group's queue.
§ Highly flexible routing of incidents - This is a requirement; when control staff
members are located in multiple sites or collocated in an operational bridge,
the incident calls can be routed efficiently and effectively.
§ Automatic extraction of data records - Automatic extraction of data records from the configuration
management database, CMDB, of a failed item and affected items is helpful.
§ Specialized software - This software is used for the speed and effectiveness of
handling incidents. BMC is an ideal system. It can help with very accurate
classification of incidents and successful matching at the point of alert.
§ Telephone systems integration - Telephone systems integration can be used to
automatically registering the names and phone numbers of users.
§ Diagnostic tools - These tools can assist with the diagnostic process so that
the support staff can more quickly diagnose the source of incidents.
One of the constant statements is that you
can't manage what you can not measure. Normally the incident manager is
accountable and responsible for reporting the performance of the incident. In
order to accomplish reporting is to have clearly define objects with measurable
targets that can provide performance information.
Common metrics used to report the effectiveness
and efficiency of the incident management process are:
§ Incident volume refers to the total number of incidents that are handled by
the incident management process.
§ Mean elapsed time shows how much time was taken to achieve incident resolution
or circumvention. The time is broken down by impact code.
§ Incident response time refers
to the percentage of incidents handled within the agreed upon response times,
which may have been specified in service level agreements by impact code, for
example.
§ Average incident cost refers
to the average cost of each incident.
§ The percentage of incidents closed refers to the percentage of incidents closed by the service
desk without reference to other levels of support.
§ The number and percentage of incidents resolved remotely refers to those incidents
that were taken care of off-site, with no physical visit.
The relationships between the incident management
process and other IT Service Management processes are:
§ The configuration management database defines the relationships among resources,
services, users, and service levels. For example, let's say a server fails.
With the configuration management database, all existing processes,
applications, and interfaces would be documented, so downstream affects would
be noted immediately.
§ Problem management provides information about problems, know errors,
workarounds, and quick fixes.
§ Change management yields
information about scheduled changes and their status.
§ Service level management monitors the service level agreements with the customer about
the support to be provided.
§ Availability management measures the aspects of the availability of services and uses
the incident records and the status monitoring provided by configuration
management.
§ Capacity management assures that storage capacity matches the evolving demands of
the business. It is concerned with incidents that relate to this objective,
such as incidents caused by a shortage of disk space or slow response time.
An item to remember and maybe even reinforce is
that the incident management process is interwoven with the other IT service
management processes. The processes work as long as all the processes support
each other.
Finally there are some common barrier in the form of costs and problems to implementing an incident management process.
Finally there are some common barrier in the form of costs and problems to implementing an incident management process.
The common costs are the implementing and
operating cost, as is standard with almost any implementation. Implementation
costs are the training, tools needed, process and workflow definition, and
resources expended in the implementation. Operating costs are the continuing
maintenance license feeds and operating resources expended.
Some of the common recurring problems that
affect all organizations are:
§ Users and IT staff bypassing incident management procedures - this results in the IT organization does not
obtaining important information about the service level and the number of
errors.
§ Incident overload and backlog - This circumvention makes it difficult to record
incidents effectively. Escalations may occur if incidents are not resolved
quickly enough.
§ Incomplete service catalogs and service level agreements - define the time in which an incident or request
for service needs to be solved or escalated. If these documents don't exist or
are incomplete, the caller may not be able to get the issue resolved--and get
back online--as quickly as possible.
§ Lack of commitment -This is
a problem because effective incident management requires real staff commitment,
not just involvement.
No comments:
Post a Comment