Join Ed Liberman for an in-depth discussion in this video Troubleshooting theory, part of CompTIA Server+ (SK0-004) Cert Prep: 7 Troubleshooting.
- [Instructor] When it comes to troubleshooting a problem, we like to think of what tools or resources we may have at our disposal. Well, I'm here to tell you that the number one tool or resource that you could have at your disposal is a good understanding of troubleshooting theory. So, the overall idea of how to troubleshoot a problem is equally as important and very often more important than knowing about the literal diagnostic tools or resources that are at your disposal.
So, let's go through some of the troubleshooting theory. The first step when it comes to troubleshooting is that you have to determine the scope of the issue. Now, to do this, you need to start off by figuring out what changed. So, what changed that may have caused this problem? And you learn this by going to your users and asking them. Now, very often, this will be the first step because they're going to be calling you to say, hey, I have a problem, so you need to interview them to find out what has changed and where the problem lies.
Now, beyond that, we can also look at different logs that we may be tracking as part of our regular activity out on our networks, on our servers, on our systems. Now, the next thing we want to do is see if we can replicate the problem and this only can happen some of the time. Very often, in the world of IT troubleshooting, we'll find that many of our problems are intermittent and you can't forcefully replicate the problem or you just might not have everything in place that you need to be able to replicate the problem, and another important step, and you'll see that I'm actually going to put this in here twice, is you want to make sure before you go any further, you've kind of figured out, here's the problem, here is who or what it's affecting, before you do anything else, try to perform a backup 'cause you're going to find that as you are troubleshooting and working to fix the problem, it's possible you may cause other problems.
So, you want to make sure you have a backup in place so you can at least get to where you are right now before you make it worse. Now, the next thing we want to do is look for a probable cause for the problem. Now, you're going to be formulating various ideas as to what the causes may be while you're determining the scope of the issue. So, while you're going through and figuring out what the issue is and who it's affecting and all that, you're going to also already going to be thinking of different ideas of what might be the actual cause to the problem.
You want to then eliminate this list of causes one by one and here's the thing, you want to eliminate the least likely first, and this is something that we tend to do the opposite of. One thing I really want to emphasize here is, don't ignore the obvious. The obvious cause, more often than not, is the actual cause. And also, another big one is don't ever assume. Just because you looked at something recently and you know that everything's fine with it, don't assume that that's not the cause.
Things change, things break, sometimes it's just a matter of time and sometimes something actually happens that has caused something to break, and this can be a physical break, if it's an actual, let's say, like a cable connection or something like that, or it could be a software or system-related break. And the one other thing that I want to point out here is that sometimes you will be hearing about multiple problems that seem to be unrelated, even though they're happening at the same time. Well, very often when you have these problems occurring at the same time, there may be one common element that is causing all the different problems, so don't rule that out.
So, next we want to go ahead and test our theories to determine what the actual cause is. So, you want to use any resources that are available to potentially test your theory. Now, these resources could be as simple as knowledge-based articles. These knowledge-based articles are basically problems that other people have already run into and what they found the causes to be and the solutions to be. These are one of the best resources that we have at our disposal, but some of the other resources may be documentation, vendor documentation, things like that.
Now, if you're theory of what the cause is is confirmed, you now have determined, alright, we know what the problem is and we now know what the cause of that problem is, now you need to determine what are the next steps to resolve that problem. But if the theory is not confirmed, you have a theory of what was causing the problem, turns out you were wrong, that's not what was causing the problem, well, then you either need to establish a new theory as to what the cause may actually be, and if you are out of ideas or you just can't figure it out, don't be afraid to escalate the issue, and when I say escalate, it could be internally.
You may have some form of tiering where you have different levels, so you start with level one support and level two support, level three support, things like that or we could be talking about escalating to an outside source, like maybe a vendor of a piece of hardware or software who can help you with the problem. Now, when it comes to the actual resolving of the problem, the first thing is you want to establish a plan of action. Don't just go in and do something, actually put a plan in place as to figuring out here are the things we need to do to solve this problem.
You want to make sure before you start doing anything that you notify any users that will be impacted because they may have downtime, things like that, as part of your resolution. You want to make sure that they're aware of it. And here's where I'm going to throw this in here, I said it would be a second time I would talk to you about this. You'll want to go ahead and again, remember to preform a backup before actually going in to make any changes to try to resolve a problem, just in case your resolution makes things worse.
Now, when you're ready to actually start resolving the problem, you want to make sure to make one change at a time and then you want to test to see if that change has resolved the problem or made the problem better or gave you the results you were expecting, and if not, then you want to reverse that change before you try something else. This is a big thing when it comes to troubleshooting, is you don't want to try something and then go, that didn't work, so I'll try something else. Nope, that didn't work, try something else, that didn't work.
You don't want to do that because what you're doing is, you're potentially making things worse and worse and worse, so make one change. If it worked, great, if it didn't, then reverse it and the reversing may be going back and recovering from the backup that I emphasize that you have to make before you go ahead and try a different idea. And again, if need be, if everything you're trying, if all the different changes you thought would work are not working, remember that it is okay to escalate.
Now, once you have actually resolved the problem, you want to, first of all, verify full system functionality. You don't want to just assume that everything's great because the apparent problem is fixed, because very often you may not have realized that your changes have now created a different problem. So, you want to make sure that you still have full system functionality. From there, you want to go ahead and perform a root cause analysis. Try to find out what was the cause of the problem and when you know what caused the problem in the first place, then you may be able to implement some form of preventative measures so that it doesn't happen again.
And ultimately, when all is said and done, document everything. One huge, common mistake is to go through and work on a problem, and while you're working on it, in fact, the more challenging the problem is, the more your brain is going to tell you, oh, I'll remember this one. Well, that doesn't actually end up being true. I've seen very often where six months later you end up with the same problem again, and you go, I remember that one, and then you didn't really actually remember it. So, make sure you document everything so that the next time you have a similar problem, you can resolve it a lot faster and I'll even take it a step further and say, well, maybe you won't be the one that runs into this problem, so you want to make sure to have it documented for somebody else.
Alright, so the more you can document, the more you can create a real detailed knowledge-base of articles where you can help yourself with future troubleshooting. So, as a whole, it's very important to understand that troubleshooting theory, the idea of how to go about troubleshooting, is as powerful as any individual tool you may have at your disposal.