From the course: DevOps Foundations: Effective Postmortems

What went well

From the course: DevOps Foundations: Effective Postmortems

Start my 1-month free trial

What went well

- Our first step in investigating our incident is going to be thinking about what went right. Is that surprising? It shouldn't be. So far, we've learned about Safety-II and that focusing on everyday function and how things are going right is one of the keys to creating safety. Not only that, but it's a way to combat negativity bias and attribution error, and those will get you five to 10 in the state pen. You've already built a lot of safety into your systems. Enhancing that can be more effective than chasing the latest flaw. In other words, how can you improve your system's existing immune system, both automation and people, but by building on the good things you're already doing? You are doing good things, right? In the Extended Dreyfus Model for incident lifecycles compiled by J. Paul Reed and Kevina Finn-Braun, they characterized questions that advanced organizations might ask around incident analysis. These are questions like, what aspects of our system and team contributed to our success here? And during this incident and the events leading up to it, how did we actively create and sustain success? How are you already monitoring, responding, anticipating, and learning? Asking these questions first helps in two ways. The first is that it reminds everyone that this is a well-managed system, and things go right the vast majority of the time. The second is that you look at how to build upon your strengths for mitigation instead of coming up with crazy new schemes that might sacrifice that safety. Examples of things that you should highlight in this section of the postmortem include, was detection timely? Was a change tested according to procedure? Were the various processes in place followed? How was the problem fixed, how did the responder figure out what was wrong, and remediate it? Did any of our safety measures prevent the issue from becoming worse? By doing this, you move your team from the just-the-facts routine of the timeline into thoughtful analysis of your system. And you started on a positive note, reminding your team of the strengths you have to build on.

Contents