From the course: DevOps Foundations: Incident Management

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Best practices for diagnosis and repair

Best practices for diagnosis and repair

From the course: DevOps Foundations: Incident Management

Start my 1-month free trial

Best practices for diagnosis and repair

- Of course, when it comes down to it the heart of incident response is fixing the problem. You should always prioritize restoring service. Try to capture information for later forensics, but sometimes you need to act even if you don't know exactly all the reasons something's happening. The heart of working an incident is to understand that it's iterative. Triage, examination, diagnosis, treatment, it's known as various things. It's often called the OODA loop, a term from the military. Observe, orient, decide, act and repeat. But you may as well also call it the scientific method. Research, hypothesize, test, analyze, repeat. In the end they're all the same thing, they're deceptively simple, but what they don't say, is just try some stuff and see what happens. Carefully characterize the problem, gather information and analyze it. Look for recent changes, look at logs, metrics, behavior, the source code. Get as much information as you can. Incident researcher John Allspaw identified…

Contents