From the course: Site Reliability Engineering: Service-Level Agreements and Objectives

Defining reliability

- [Instructor] Let's start by defining the basic building block for measuring the reliability of a service, the service-level indicator. A service-level indicator, or SLI for short is an indicator of the level of service you provide via your service, ideally expressed as a ratio of two numbers. Typically, it is recommended to think of an SLI as a ratio of good events, divided by the total number of events. For example, if you want to measure the reliability of a web application, you may want to capture the ratio of successful http requests to all http requests for that site. When picking service-level indicators for a service, it's important to ask the following questions. What actually makes your users happy? What behaviors really matter to them? To answer these questions, it may be helpful to undergo a few thought exercises to help guide you. First, clearly define who the users are in this situation. Then, consider the typical ways those users interact with your system. What are the critical tasks they need to perform? Lastly, draw a high level architecture diagram of your system's components. What does the request flow look like? How does data flow through the system? What critical dependencies exist? It's important to think through how a user's happiness ties directly into your system but don't overthink it. Pick something about your system that you think is relevant and is also simple to measure. You can always go back and tweak your SLI down the road. The service outcomes you think matter to users are referred to as SLI specifications. These specifications are intentionally measurement agnostic, meaning they represent the assessment of a service level indicator in the abstract only. For example, let's say you want to measure the latency of a request driven service, like a web application. One SLI specification could be the ratio of page requests that load in less an 100 milliseconds to all requests. Note that with an SLI specification, you do not specify the specifics on indicator measurement. Measuring the assessment is referred to as an SLI implementation. We'll discuss SLI implementations in the next video.

Contents