In this video, Jeff Winesett discusses designing for failure in the cloud. Thinking of a failure during the design process helps build failproof cloud based applications. Avoiding single points of failure is necessary to building failproof applications.
- "Everything fails, all the time." This is an oft used quoted from the CTO of Amazon which really emphasizes the need to keep failure in mind when designing cloud-based systems. While this certainly sounds bleak and terribly pessimistic, systems and hardware fail. His statement is intended to get a new cloud developer and architect in the right mindset to be thinking about failure and designing for it accordingly. The idea here is that while failure should be expected within the individual components of an application, the overall system can be architected to prevent application failure.
Expecting parts of the system to fail will drive an architecture that leads to the building of an overall system that won't fail. Thinking about failure up front causes recovery strategies to be part of the design process, which will lend itself to a better, more stable end product. One major rule of thumb to consider when designing for failure is to avoid single points of failure. Take for example a common web application, where a single server, instance, hosts both the web server and the database software.
While this architecture will get an application working, it's not designed for failure. If anything happens to this single instance, the entire application fails. One step might be to move the database to its own instance. This separates it from the web server. This architecture now allows for the introduction of a load balancer, which allows the scaling of web servers. This is a step in the right direction. Now the system can tolerate a web failure without resulting in a system-wide failure.
But even in this improved architecture, there are still single points of failure. Now, both the database server and the new load balancer introduced, represent single points of failure. If either of these fail, the entire system fails. In a world before AWS, horizontally scaling the database tier was very challenging. And adding redundancy for either the database or the load balancer was cost-preventative for all but the largest companies.
But with AWS, removing such single points of failure is achievable. Here's the same application with two web servers connecting to a single database instance. However, now the application is leveraging Amazon's RDS service for the database. RDS can be configured to have a standby instance as a secondary database server. It can also be configured to automatically fail over in the event of a database issue. Now, if our primary database fails, the system will automatically switch to use the alternate database which has been happily standing by to help in just such an event.
RDS takes care of all the messy synchronization details. Similarly, for the load balancer, Amazon's Elastic Load Balancer service can be used. With this service, scaling and redundancy are automatically included. Amazon takes care to ensure this is not a single point of failure for your system. Avoiding single points of failure, of course, is not just relevant for the hardware components, it includes considerations at the network and software levels too. And even when single points of failure have been avoided, consideration must be given to how the fail over happens.
The fail over process itself may entail other hardware, software, or network resources, and this may need to be incorporated into the application design. AWS provides many services to help eliminate single points of failure. I'll be introducing these services throughout this course.
- Benefits of cloud services
- Making architectures scalable
- Examining cloud constraints
- Virtual servers, EC2, and Elastic IP
- Using the Amazon machine image
- Elastic load balancing
- Using CloudWatch for monitoring
- Security Models
- Elastic block storage
- S3, CloudFront, and Elastic Beanstalk
- Handling queues, workflows, and notifications
- Caching options and services
- Identity and access management
- Creating a custom server image
- Application deployment strategies
- Serverless architectures