Responding to Incidents – Sensible Service Management Part 4
In the previous post of the Sensible Service Management Series from Rob England (The ITSkeptic), we looked at the core of servicing customers: managing Requests, we are now going to cover responding to incidents.
In GoToAssist Service Desk, everything we respond to is called an Incident. So, let’s talk about Incidents in the strict sense of the word: dealing with things going wrong.
One caveat first. ITIL confuses things by talking a lot about finding and fixing the underlying Problem, and recovering the broken service, as part of the Incident process. I’m going to analyze those separately in a future blog post as part of the Problem process. Here’s how I want to distinguish between incidents process and problem process:
- Incident process is about getting the user(s) working again as quickly as possible, however we manage that.
- Problem process is about fixing the underlying cause(s).
Now with respect to Incidents, everything I said last time about Requests still applies. But I want to add more a few more considerations when there’s an Incident needing to be fixed. As you’ll recall, there are three main capabilities:
Provide a single point of contact with multiple channels to access it. Make sure you keep a record of all requests as Incidents in Service Desk. Record all interactions with your users. Track all your responses and record what you did about them.
Make sure you own every request. Use Service Desk to track and organize the workflow, passing requests to the right technician. Regularly monitor how long requests are taking and chase up the slow ones. And employ Service Desk’s capabilities to make support more efficient and reduce demand on your team. Provide scripts for how to deal with common requests. Build up information in the Service Desk knowledgebase for your technicians and for end-user self-service.
Look for trends to help you improve your service.
OK, so now let’s extend that. Sometimes a user requests help with a service not working as expected ‒ something needs to be fixed. That is an “Incident” in the strictest use of the word.
That “user” reporting an Incident may be an internal person picking up an error before it affects any of the “real” users consuming the service. It can even be a software program detecting the error and automatically alerting us.
If something needs fixing, then #2 “Respond” is obviously a crucial part of our Request process and how we resolve the incident. So, I want to expand my analysis of this core aspect.
It helps to categorize all incidents, whether general requests or issues to be fixed, but it is particularly important to get a general idea of what type of incident it is. We need to determine how serious it is and how wide and severe is its impact, so that we pass it to the right person the first time.
This is where keeping records of past incidents and responses, along with building up the Knowledgebase, really pay off. Search Service Desk’s Incident records, Problem records and the knowledgebase to see if we know what the cause is and how to fix it. If you get a match, fix it or pass it to somebody who can. This is called level 1 support.
If you don’t get a match or can’t fix it, pass it to level 2 support: those who have the technical skills to do specialist diagnosis and resolution. If they can’t fix things, they refer the incident to Level 3 support: the folk who built or supplied the stuff that is not working, often a supplier external to your organization.
All this passing around among support groups is called “functional escalation,” but when we talk of “escalating” we usually think of “hierarchal escalation,” i.e. telling somebody more senior. We hierarchically escalate because:
- The incident impact is serious enough that they should know about it
- A fix can’t be found
- Someone is not responding fast or well enough considering the severity of the incident
That senior person might determine that this is a “Major Incident.” This means we drop the normal process described here and switch to a crisis-response process that will be described in a future blog post.
Somebody is not getting the service they expect. The incident process must focus on restoring that service. That might not be the same thing as fixing the underlying Problem. (I’ll consider the process of fixing Problems in a future blog.) If we have to fix the Problem in order to get the user back on track, we will, but sometimes there is a “workaround”: a way to get them back up and running without fixing anything. For example, with some software, simply logging off and on again may get them around an issue and working again. Or rebooting a server may make the problem go away. (There’s an old IT joke: “A problem gone is a problem solved.”)
You can find workarounds in Service Desk’s Problem records and/or you can also record workarounds in the knowledgebase.
Eventually a Problem may cause so many Incidents that we have to hold the user up without a workaround while we properly diagnose it and nail it once and for all. That is a management call whether the inconvenience is outweighed by the ongoing cost of recurring incidents. But in general the Incident process takes whatever workarounds or temporary fixes it can to get service restored to the user as quickly as possible.
This applies to all Incidents and Requests. Before you close the ticket make sure:
- You tell the user it is done
- The user thinks so too and that they are happy with the outcome
- The Incident is properly categorized so our reporting data is useful
- The Incident has a record of everything that happened and what workaround or fix you used. In the future, you or one of your colleagues may be grateful you wrote it down.
There is a huge body of knowledge out there about Incidents and Requests, which you can investigate further as you need to. ITIL has a lot (in the version 3 book Service Operation and the Operational Support and Analysis intermediate course). The Helpdesk Institute (HDI) produces a lot of useful material too. COBIT 5 is my choice for formal definition of what should be happening and what should be produced and by whom.
For now, start with:
Want to read more sensible service management goodness?