Learn what disaster recovery is at a high level and why all organizations should have a procedure as part of their business continuity plan.
- [Man] Disaster Recovery, or DR, is an organization's recovery plan for critical IT systems in the event of a disaster. DR includes documented policies and processes that fall under the wider umbrella of an organization's business continuity plan. A business continuity plan covers a wider range of negative events such as public relation issues and insolvency of suppliers, along with disaster recovery. A DR plan typically only relates to critical systems in an organization's IT environment.
What qualifies as a disaster? A disaster is a vague term referring to any number of events that may cause disruption in critical IT infrastructure. A disaster can be anything from a natural disaster disrupting a data center to an electrical failure, to a server administrator entering the wrong command and deleting all production data. In fact, in February of 2017, an Amazon employee entered the wrong command into a server that brought down Amazon web services simple storage service, or S3, in the North Virginia region for five hours.
Because so many organizations rely on S3 to run their web sites, many were down and not preforming properly. Popular sites and services such as Twitch, GitHub, and Salesforce all experienced service disruption. My personal favorite ironic outage, Is It Down Right Now dot com, a dashboard that identifies any outages across popular services, was also not loading. This is an extremely rare event, and not one I have personally experienced in my years working with AWS. It has been estimated the 2017 S3 Outage costed the S & P 500 organizations over 150 million dollars.
But when such an event occurs, an organization must have a plan in place to enable IT systems to be quickly restored to reduce business impact. The need for a DR plan ultimately comes down to compliance requirements, and a financial analysis. Organizations need to be aware of regulations that mandate a DR plan. For example, financial institutions in the US that fall under the Federal Financial Institutions Examination Council regulation are legally required to have a DR plan that is regularly tested.
Even if regulatory agencies do not mandate a DR plan, a financial analysis should. This defines the cost to an organization for each hour during which critical systems are down. This will drive the type of DR plan architecture. IT organizations should work with their finance departments to calculate the Return on Investment of a DR plan when defining requirements. Luckily, with the advent of AWS and the public cloud, finance departments have a much easier assessment. Public cloud has drastically reduced the cost to an organization for building and maintaining a DR environment.
- DR in the public cloud
- Recovery time objective (RTO)
- Recovery point objective (RPO)
- AWS platform services that support DR
- Comparing the four DR architectures
- Differences between high availability and DR
- Cold DR procedure
- Pilot light DR procedure
- Warm standby procedure
- Multisite DR failover