If you build it, you run it, and other patterns for reliability engineering.
- Welcome to our video in reliability engineering. This is the third major practice area within DevOps. - [Man on Right] In engineering, reliability describes the ability of a system or component to function under stated conditions for a specified period of time. - [Man on Left] In IT, this includes availability, performance, security, and all the other factors that allow your service to actually deliver its capabilities to the users. - In any kind of well-managed, modern infrastructure it's increasingly rare for outages and production problems to be caused by the infrastructure. Once you get past the most basic system automation, it's not an exaggeration to say that 90% of production issues are software problems. - Yeah, that's totally right. Well, but in traditional IT, when reliability, performance, or security are mentioned they sometimes can get referred to as non-functional requirements. You know, many product managers even don't even consider them to be part of their responsibility. - Yeah, and this leads to inefficient manual handling of problems since developer resource isn't allocated to fix them and to conflict between the teams due to their different priorities, and this eventually leads to slow down due to process warfare. - Yeah, you know mean time to recovery, or also MTTR, that's the measure of how quickly your service can recover from a disruption and restore service. In high-performance shops, the average is less than one hour. - [Man on Right] The other part of the puzzle is how frequently you have failures. The mean time between failures, or MTBF. The total disruption of your service is a function of the MTBF and the MTTR. - [Man on Left] Patrick Duvua identified four key areas of DevOps extending delivery to production, extending feedback from operations to development, embedding development into operations and embedding operations into development. - We're going to use this to illustrate a holistic approach to reliability engineering, but we're going to simplify it by combining the embedding and feedback portions into two areas that we call design for operation and operate for design. - In design for operation, we will examine how you construct your system to be maximally reliable and maintainable in the first place, feeding from the project end operations. - And then in operate for design we'll talk about the practices within operations and how to radiate all information from production back to the project per the feedback loop idea of the three ways. When you bring together both practices you have the DevOps take on reliability engineer. - [Man on Left] You may have heard the term site reliability engineering. That's a term that Google popularized for this approach. Google has product teams support their own services until they reach a certain level of traffic and maturity. And even then they have the development team handle 5% of the operational workload ongoing. - This keeps a healthy feedback loop in place that continually improves the product's operational abilities. - Okay, well now let's dig into reliability engineering with specifics on how we craft reliable systems in our next section design for operation.
In this course, well-known DevOps practitioners Ernest Mueller and James Wickett provide an overview of the DevOps movement, focusing on the core value of CAMS (culture, automation, measurement, and sharing). They cover the various methodologies and tools an organization can adopt to transition into DevOps, looking at both agile and lean project management principles and how old-school principles like ITIL, ITSM, and SDLC fit within DevOps.
The course concludes with a discussion of the three main tenants of DevOps—infrastructure automation, continuous delivery, and reliability engineering—as well as some additional resources and a brief look into what the future holds as organizations transition from the cloud to serverless architectures.
- What is DevOps?
- Understanding DevOps core values and principles
- Choosing DevOps tools
- Creating a positive DevOps culture
- Understanding agile and lean
- Building a continuous delivery pipeline
- Building reliable systems
- Looking into the future of DevOps