Start free trial Sign in

From the course: Cert Prep: SNCP Foundations (S10-110)

Implementing backup and recovery SLA

From the course: Cert Prep: SNCP Foundations (S10-110)

Start my 1-month free trial

Implementing backup and recovery SLA

“

- [Instructor] Data consolidation gives us an opportunity to provide a backup and recovery service level agreement. Service level agreement is between IT and various business units, basically establishes the levels of service including uptime, business continuity, data protection, and performance for the various tiers of storage. The top tier, the most important tier, the middle tier, and the lowest tier. And the purpose of an SLA is really to focus spending on the high-value IT functions and to save money on the lower-value IT functions. So, here we see the concept of implementing a backup and recovery SLA. We've consolidated the data from Customer Service, from Email, from Office, from Manufacturing Inventory. It's now on a storage area network and a storage array, and it becomes very much feasible for us to implement a service level agreement for backup and recovery. Here is a SNIA Exam tip. For a backup and recovery service level agreement, it comes in two big components. The first is RPO, recovery point objective and RTO, recovery time objective. Recovery point objective is easily understood as simply the duration since the last backup. If you backup at noon and midnight, then your RPO is 12 hours. If you backup more frequently than that, your RPO will be less than that. If you backup only once a day, your RPO will be 24 hours. And the RTO is the recovery time objective, meaning once your department receives a request to recover a specific backup, what is the amount of time that will transpire between receiving that request and having that backup recovered? Typically, it's an hour or so. Can be more, can be less based on how you do backups and based on the service level agreement of your specific IT application. All of this depends on consolidated data. If you have consolidated data, then you have an opportunity to do consistent backups, you have an opportunity to do reliable recovery, and you have an opportunity to do professionally managed data protection, all in the interest of spending less and delivering more. So, let's continue this example and improve our SLA. So, before we just saw a service level agreement for backup in tape. Here we introduce the concept of a disk-to-disk backup as opposed to a disk-to-tape backup. With disk-to-disk backup, we can grab a much larger chunk of data in a much shorter time and this also improves the ability to deliver on a backup and recovery SLA. It's important to recognize here that all SLAs and all IT applications are not the same. Here we differentiate between an SLA for tier one, which is the main event for any business, the time is money event, and the SLA for tier two, which is the good enough performance. This is the high-growth tier two for Email, for Office, for Manufacturing Inventory, and so forth. And here what I'm suggesting is, if you isolate the load and isolate the storage hardware, then you can also guarantee a much higher SLA for backup and recovery. Let's dive into a couple of real examples of SLAs. It's more than just backup and recovery, as you see with this example. Uptime is important. In this particular SLA example, uptime of 99.99%, which equates to eight minutes of downtime, is part of the service level agreement. Also a part of the service level agreements, onboarding of new capacity and data protection which we've talked about a moment ago. In this particular example, we talked about data loss of no more than 30 minutes old, so obviously this means you're taking snapshots every 30 minutes, that's the point-in-time recovery. There's also a business continuity SLA for data center catastrophe recovery. So, critical business systems, not every system, but the critical ones can be replicated off-site and we'll dive into that in future sections, and in this example, recovered within 12 hours. And there's a performance SLA, meaning that the performance of this tier one shall be no less than 20 megabytes a second and no greater than 60 microseconds. This is all optimizing around fast and safe where we're spending a fairly decent amount of money to deliver it both fast and safe. Conversely, for SLA 2, we're focusing on safe and cheap. So, lower dollar per terabyte, no performance SLA, so you get the performance you get and this may very well be acceptable for email systems and similar. Data protection, you may have a snapshot every 24 hours so that's why the point-in-time recovery is much longer than the previous example. Uptime, 99.9%, which is a much higher tolerance for downtime. You're never, ever going to get 100% uptime, that's just not realistic, but spending money to get close to that 100% downtime is, as appropriate, the right thing to do based on the business importance of your application. Now we're going to take a look at another real-world example that's offered by the Google Cloud storage solutions. In this case, we see a multi-region database with a high SLA, so this is a fairly expensive system, it has a consistent performance SLA, a consistent business continuity SLA, and they offer what types of applications it will be appropriate for. This regional SLA is slightly looser, it is lower uptime, cheaper price tag, et cetera. So, this is an an example of a tier two and in this particular service, they have a nearline SLA, which is for more infrequent access. Obviously it's lower uptime and lower price tag, and lastly, the coldline, which is for archival and so forth, which is significantly lower in terms of uptime and in terms of cost. This is a real example of how others have done SLAs and your SLA may be adjusted based on your business applications and your business needs, but it's a healthy thing to get an SLA in place so that your business units know what to expect and so that you are spending money where it needs to be spent and saving money where it needs to be saved.

Contents