From the course: Cloud Computing: BC/DR Best Practices

Identifying data for backup

From the course: Cloud Computing: BC/DR Best Practices

Start my 1-month free trial

Identifying data for backup

- [Narrator] Data is a fundamental concept, when you consider disaster recovery. Keep in mind that we have to backup to the data into a redundant site, something that will reduce the risk that we're going to lose the primary database. And so, selecting the databases, and the data itself, becomes a core part of dealing with disaster recovery in the cloud. So, keep in mind, with databases we have object-based databases, like DynamoDB, which is Amazon Web Services. We have relational databases, like MySQL, any number of clouds support that. And we have columnar databases, such as Amazon Web Services Redshift, which are used for data warehousing. And then, there's many others, as well. There's special purpose databases, highly transactional databases, there's databases that are memory only, all these sorts of things are there, and you have to understand that no matter where the data resides, it has to be backed up in order to be resilient through some sort of an outage or a disaster. Data, we transport data via objects, in other words, groupings of information. They can be rows, basically an instance or a record. And, they can be records unto themselves. And so, you'll hear people talk about databases in different ways, based on the databases that they're leveraging. And they'll use object rows, records, it all means ways in which we're accumulating data, and ways in which we're structuring data. Tables, as well, in the relational world, the ability to organize databases via tables. And then, many others as well. So, keep in mind that if you're going to deal with disaster recovery in the cloud, that information exchange, or data, is going to be a fundamental process to that way of actually looking at how you're going to backup information, and understanding where the data is, and where it needs to go is an imperative. So, keep in mind, data's also stored in files. We have object storage, file storage, and, of course, block storage, such as Amazon Web Services EBS, Elastic Block Storage. Also, we have binary large objects, or blobs, and that's basically what it sounds like, it's going to be a video file, a multimedia file, typically not going to have a structure, but it's going to be something that's stored on disc is known to be data, but in unto itself is not information that can be read in unto itself is a binary that has to be read by another system. And then, other features and functions, as well, that are part of files. So, data could be stored in files, could be stored in objects, could be stored in blocks, or it could be stored in the database. So, keep in mind, we have to look at the differences between a differential backup and an incremental backup. Let's look at the differences. So, we're doing a full backup, information data is copied in the entirety, each time. And so, in other words, we take a copy of the database, we take a copy of the file storage system, and a complete copy, and we overwrite the existing backup. Incremental means that the information is backed up in its entirety one time, and then, basically, the only data that changes is backed up in the future. This is popular because we don't have to go through very long backup cycles. In other words, initially, it's going to backup everything, and then we only backup new data, or data that's changed. Differential backup means that the information is copied, in it's entirety, one time. And, ultimately, only the sets of backups are created around information within the database or files that changes. So, other things to consider. Well, you have to look at the fact that it's either transactional, which means information is shoved in the database as we're doing a sales order or buying stuff online. It's something where information is updated all the time. Could be information for decision support, such as a data lake or a data warehouse. Could be information that's abstracted, or the ability to leverage data virtualization tools, where we're able to look at a number of physical databases and see it through a common schema that exists only in memory. And then, we have the ability to deal with multiusers and multitenants. That has to be in consideration as well, because, ultimately, if many users are leveraging the transactional information, the information's going to be changing all the time, so how do you sync those changes over time? And how do you leverage backup and recovery operations without interfering with those users, or interfering with those tenants? And also, the geographic regions. In other words, our ability to, in essence, store our data localized in a particular region, say, Northern Virginia, or California, or one of the Midwestern states.

Contents