From the course: Cybersecurity with Cloud Computing
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Anatomy of a service failure
From the course: Cybersecurity with Cloud Computing
Anatomy of a service failure
- [Instructor] In September 2019, Google's VP of engineering apologized to customers for taking far longer than the company expected to recover from a configuration mishap which impacted 1% of more than 1 billion Gmail users and disrupted YouTube and Google cloud storage traffic. Google's cluster management software is used to automate the response to data center maintenance activities, such as power infrastructure changes or adding additional network equipment. Cluster management jobs are labeled with an indication of how they should behave in the face of such an event. They can be moved to a machine which is not under maintenance or stopped and rescheduled after the event. The incident occurred when two normally benign misconfigurations were made in the presence of a specific software bug. Network control plane jobs were configured to be stopped if maintenance took place. All the instances of cluster management…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.