From the course: DevOps Foundations: Incident Management
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Best practices for diagnosis and repair
From the course: DevOps Foundations: Incident Management
Best practices for diagnosis and repair
- Of course, when it comes down to it the heart of incident response is fixing the problem. You should always prioritize restoring service. Try to capture information for later forensics, but sometimes you need to act even if you don't know exactly all the reasons something's happening. The heart of working an incident is to understand that it's iterative. Triage, examination, diagnosis, treatment, it's known as various things. It's often called the OODA loop, a term from the military. Observe, orient, decide, act and repeat. But you may as well also call it the scientific method. Research, hypothesize, test, analyze, repeat. In the end they're all the same thing, they're deceptively simple, but what they don't say, is just try some stuff and see what happens. Carefully characterize the problem, gather information and analyze it. Look for recent changes, look at logs, metrics, behavior, the source code. Get as much information as you can. Incident researcher John Allspaw identified…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.