From the course: AWS: High Availability

Reviewing high-availability concepts - Amazon Web Services (AWS) Tutorial

From the course: AWS: High Availability

Start my 1-month free trial

Reviewing high-availability concepts

- [Instructor] You've made the decision to use Amazon Web Services, or AWS. In order to ensure your customers have the best possible experiences with the services you provide, it's time to consider how to use the tools AWS provides to create fault tolerant, highly available systems. Before we get into AWS specifics, let's go over some concepts related to fault tolerance and high availability. To start off, let's talk a bit about the difference between fault tolerance and high availability. The applications and services you operate consist of many components. Let's say you have a classic three tiered application consisting of load balanced web servers talking to load balanced application servers talking to a database with a hot standby. Suppose you experience a failure in your web and application tiers. With appropriate load balancing and application state management, your application will continue to operate. Since you've lost some capacity, your application performance may be degraded. Relational databases are a bit trickier. It can take a bit longer for your database to fail to its hot standby. In that failure scenario, your users may experience a brief service interruption, depending on how your applications re-establish database connectivity. Your application will still work, but if you lose the standby database, you'll have big problems. Again, you're operating in a degraded state. A fault tolerant application can continue to operate when one of its components fails. Depending on the type of failure, the application may or may not be running in a degraded state. This brings us to high availability. High availability is all about how long a given system stays up over a specified period of time. You've probably had this conversation in terms of the number of nines. For example, AWS has an object storage offering called Simple Storage Service, or S3. The service level agreement for S3 starts offering service credits if monthly availability falls below 99.9%, or three nines. That comes out to just under 44 minutes per month. Your business requirements will dictate the number of nines you target from an availability perspective for the systems you operate. Of course, the greater the number of nines, the more expensive and difficult it is to achieve that desired objective. The cost isn't linear. Getting to five nines is incredibly expensive and hard to do. Physical servers are made up of many small, complex pieces. While servers do have redundancy built in, component failures can include network cards, disk drives, CPUs, memory chips and more. Let's say you've virtualized your servers, but are running them in a data center or co-location facility. You still have the underlying hardware to be worried about. If you step back and consider the data center, there is an entirely new set of concerns. You have to worry about redundant networking switches, routers, environmental controls, internet connectivity and more. This brings me to one of my favorite quotes of all time. Werner Vogels, VP and CTO at Amazon.com, once said that, "Everything fails all the time." He is acknowledging that every physical component is going to fail at some point. With AWS, you're freeing yourself from the headaches, cost and complexity associated with physical failures. Instead, you get to use AWS tools to build your application as available and fault tolerant as your business warrants. Before we continue, remember you don't have to be exactly like Netflix. Frequently, and deservedly, Netflix is referenced as a poster child of what is possible in AWS. This is because, as a company, it has engineered remarkably resilient and available applications using AWS tools. Doing so is an expensive undertaking. You have to stop and consider your recovery time objectives. You also need to define your recovery point objectives. With clarity on how long your business can tolerate an outage, as well as the time to which your system's state can be restored, you can keep cost in mind as you proceed with designing a highly available system in AWS to meet your needs.

Contents