Join Joseph Lowery for an in-depth discussion in this video Managing scalability and replication, part of Learning Cloud Data Storage.
- The raw power of todays cloud data storage industry is really apparent when you consider two defining characteristics: Scalability and Replication. In this lesson, we'll take a look at each in turn to see how you can make the most of these really, truly mind-expanding possibilities. Scalability is the ability of a system to efficiently adapt to handle the current workload. The vastness of the networks now available for cloud data storage means that there's virtually no limit to the number of objects or the amount of data that you can store online.
Moreover, this scalability is, for the most part, effortless for customers of these services, because the infrastructure is already in place and being maintained by the service providers. On the bulk of cloud data storage hosts, there are an infinite number of containers available, and each container is infinitely large. When you try to store more objects in a container than can be physically contained in a single drive, the data will be written to other systems while still existing within the same virtual bucket.
Although the image that most frequently comes to mind when you say scalability, is one of the service increasing it's processes to meet surging tasks, scaling up, the ability to discard unneeded processes, scaling down, is just as important. Because cloud data storage runs on a pay for what you use model, most storages calculate their storage charge on a monthly average use. Now if your average goes down, the charge goes down.
Now, next, let's turn to replication. Replication is the duplication of data in real time over a network. It's a common practice among cloud data storage platforms to automatically replicate your objects when they're added to your containers, and store the redundant objects in multiple devices, usually in the same region. When the object is replicated, everything remains the same. The key name, the metadata, the container, everything.
The primary goal of replication is data protection, or durability, making sure that your data objects are available. Durability is the probability that an object will be the same as when you transferred it after one year. The greater the likelihood that your data will be available, the higher the durability. 100% durability would mean that an object could not be lost. 90% durability means that there's a one in ten chance.
So AWS rates their S3 standard storage class at 99.999999999% durability. Translated into English, this means that if you store, say 10,000 objects with them, one might get lost every 10 million years or so. Now, remember what I said, this automatic replication is to other devices within the same region, right? Now you can also replicate your data to a different region.
Why would you do this? Well, there are a number of reasons. One, as pointed out in the previous lesson, you can reduce latency by housing your objects as close as possible to your markets. Two, regulatory compliance may mandate that your data be stored redundantly in remote locations. And three, your internal infrastructure may have remote offices that require access to the same data. Data cloud storage scalability and replication work hand-in-glove to provide your data a substantial, stable, and secure footprint on the web that's always accessible.
- Discovering cloud storage solutions
- Managing database content
- Assessing API interconnectivity
- Troubleshooting cloud data storage issues
- Reviewing cloud storage vendors
- Setting up Amazon Web Services (AWS)