From the course: AWS Certified Database – Specialty (DBS-C01) Cert Prep: 3 Migration and Management

Backup and restore strategies

- Okay, let's talk database backup strategies. We need to have a good plan when it comes to backing up our databases. And so therefore, it is essential to know your options. And your options vary depending on the kind of database instance that you're talking about. So let's talk first about unmanaged database backup strategies. Unmanaged, what does that mean? Well, I've called an unmanaged database quite frequently instance-based databases, okay? So it means we're running an EC2 instance, we're running the operating system, we're running the database management system and we have a database in it. That's an unmanaged database, meaning that AWS doesn't manage it, we have to. So the first thing to know is you need to determine your backup time because there's no automatic backups for the DBMS based systems in an EC2 instance, like there is an AWS RDS. So what I need to do is I need to figure out when I'm going to back up, what's my time window? Is it in the night? Maybe that's not best for you, you need to know when it is least utilized. So it may be for your database, it's actually mid afternoon when it's least utilized, because the backup can happen while people are using it, but it's going to impact performance. And so we don't want it to happen while people are using it if we can avoid that. If we can't avoid doing it when people are using it, we want to do it when the least number of people are using it, or the least intensive transactions are happening. So that's where you determine your backup time, the least intensive active time. Then you need to determine your backup type. And you're probably going to use varying backup types over time depending on what you're doing. You have full backups, differential and incremental, these are kind of your traditional backup types. So a full backup means we're just backing up the whole database, everything, as we say in West Virginia, the whole kit and caboodle. We're going to back it all up, okay? That's a full backup. Now you might want to do that every day. If it's a small database, then you can probably get by with that. If it's a large, hundreds of gigabytes in size or multi-terabytes in size database, you probably don't want to do a full backup every single day, because it takes a very long time to do the backup. That's where the other two types come into play, differential and incremental. A differential backup backs up everything since the last full backup. You need to be clear on understanding what I'm saying there. I didn't say a differential backup backs up everything since the last backup. I said it backs up everything since the last full backup. Even if I've made other differential backups in the between time, it still backs up everything since the last full backup. Incremental backs up only things that have changed since the last backup of any kind. So if there was a full backup then a differential and then I do an incremental, then the incremental only backs up what was changed since that differential. But if there's a full backup and then I do an incremental, it backs up everything since the last full backup. If I do a backup, a first incremental and a second incremental, the second one backups everything since the first incremental, okay? So it's backing up everything that's changed since the last backup of any type. So those are your three basic types. And you can have different strategies and plans for using them, maybe a full backup on the weekend and then differential backups every night and possibly incremental backups throughout the day. Now we have two other types of backups. There's a copy backup. Many database management systems support copying a database from one place to another. It's literally just a full copy of the database. This by the way could be a migration tool that you use, for example SQL Server has something called the Bulk Copy Program, BCP. You could use that to migrate a database into the AWS RDS system. So copying a database is a possible way to create a backup of it. The last type of backup is a snapshot. A snapshot takes a picture of the database at a point in time. And it happens very quickly, hence the name snapshot. It's snappy, right? It gets done fast, in the moment of a snap. Well, snapshots are useful because when you create it, it doesn't actually cause any intensive activity on the server. You might think how's that possible? Well, all we do is we say, boom, from this point forward, start to log changes. And so I create the snapshot, and what happens then is as I make changes, they're made in the snapshot, making it so that it didn't really cause any intensity during the creation of the snapshot. So it's a very interesting way to create a point in time recovery capability of your database. Now, that's your instance based or unmanaged database backup strategies. What about your managed database backup strategies? Well, remember, we've already heard that AWS managed databases use automatic backups. So they're done automatically. Now you may be able to configure how often they're done, and you may be able to configure how long they're retained, but they're going to happen automatically so you don't have to worry about necessarily going in and making it happen, you can just accept the defaults and let it do its work. So AWS RDS does support snapshots, meaning that I can go in at any time, take a snapshot of my database, and then I'll be able to restore to a point of time if I desire to. I can also export data, that's a form of backup, so I can export data out of the system. And remember, multi-AZ uses replication as a form of backup. So if I implement my database instance as a multi-AZ instance, remember, it creates that backup instance which is my standby replica in another availability zone. And so effectively, replication is happening all the time between my primary instance and that replica instance. Do keep in mind though the concept of reverting back to a point in time if someone goes in and deletes a bunch of stuff is not necessarily automatically made available to me by having a multi-AZ database. That I need backups and snapshots for. So if someone makes a mistake, goes in and deletes all the records in a table and I can't get to the server before the replication happens to the multi-AZ replica, then the replication has happened. And now I need to go to my backups to restore that information.

Contents