Learn how to identify cases in which network bandwidth constraints may make data transfer infeasible, and to articulate their options for transferring data via AWS's shipped hard drive service, Snowball.
- [Instructor] When it comes to transferring data to the cloud, especially in very large quantities, latency and bandwidth restrictions are constant concerns. There are many points at which our throughput might be limited by the speed of our connection or bottlenecked by low bandwidth network ops. We can alleviate this somewhat by using a technology like AWS direct connect, but at a certain size of data load the bandwidth of the internet just isn't going to cut it. We'd almost be better off loading our data into a truck and driving it to AWS. In fact, this is true, given sufficient data quantity and demanding enough time constraints, physically moving hard drives of data is more efficient than moving that same data over the internet.
Back in 1981 American Dutch computer scientist Andrew Tanenbaum put it thusly; never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway. For an interesting dive into the implications of this statement, take a look at XKCD author Randall Munroe's what if article on FedEx bandwidth. For just these sort of situations AWS provides AWS Snowball. Snowball is a ruggedized device that Amazon ships to your door via, what else, Amazon Prime. Snowball provides a way to get large quantities of data in and out of S3.
You can receive data from an S3 bucket in the mail, or create an export job wherein you will load data on a device and ship it to AWS. To get started you'll log into the AWS console and create what's called a Snowball job. A job consists of a few things. For example, for an import job you'll specify a shipping address, a destination S3 bucket, and an optional lambda. This lambda will be executed for every S3 put performed by the Snowball import job. You can use this to kick off all kinds of processes related to you data import.
You'll also select a size for your Snowball. The device can be ordered in 50, 80, and 100 terabyte variants. The 100 terabyte size is exclusive to a Snowball variant called Snowball Edge. Snowball Edge is a true edge computing device, rather than just storage it allows you to cluster multiple devices and actually run compute on them via lambda functions. So you can not only transfer data, but you can perform work. For this series we'll focus on the normal 50 or 80 terabyte Snowball transfer device. How about security? If you're going to send terabytes of valuable data through the mail, you want to know that it's safe.
First, Snowball requires that you encrypt the data to be transferred. You must select an AWS key management service key when you create your job. This key will be used to encrypt the data while it's still on your own device. Then the encrypted data moves to the Snowball, never even traveling with a key. When it arrives in S3, the same KMS key can be used to decrypt. Second, the enclosure itself is tamper proof and equipped with something called a trusted platform module that can alert AWS to any unauthorized changes to the hardware.
Finally, you can configure SNS notifications for things that occur during the job's lifecycle. This way you can get text messages or emails for things like job created, Snowball shipped, Snowball delivered, Snowball is back at AWS, job is completed, or job is canceled. The Snowball will arrive at your location via one or two day shipping, your choice. When it does, you'll connect the device to your network and install the Snowball S3 adapter software on your computer. This software comes for Windows, Mac, and Linux, and provides an S3 compatible endpoint for you to transfer data to locally.
Remember, you will have specified in advance the target S3 bucket and the KMS encryption key that will be used. When you're done the Kindleesque E-ink label on the device updates to the facility you need to ship to, and you simply ship the device back via your carrier of choice. When you get the SNS notification, you'll know your data is securely in S3. So all this is very cool, but when is Snowball really the best method for getting your data to the cloud? AWS has some rules of thumb in their published documentation.
It's really a function of your available bandwidth, amount of data to transfer, and your time constraints. They give some examples in this chart. For instance, on a hundred megabit connection, a hundred terabytes could take almost 4 months to transfer. And that's assuming you don't run into any connection problems along the way. AWS recommends that if a data transfer over your connection is going to take longer than a week, you're a candidate for Snowball, which is only bound by the transit time of the delivery service. Now if you have a persistent need to move large quantities of data to AWS from an on premises data center, it would be cost prohibitive to use Snowball over and over, you're going to want a bigger pipe.
Although it's outside the scope of this course, I'd recommend you look at AWS direct connect which offers AWS customers a way to tap into high bandwidth networking infrastructure that goes directly to AWS. Still, if you're looking at some limited, or even one time large scale data transfers, and you need to get them to the cloud with confidence, take a look at AWS Snowball. Ah, and if you've got a really huge data set to move, we're talking about exabytes, well AWS has you covered there too. Meet AWS Snowmobile. It's a literal semi truck.
Join AWS architect Brandon Rich and learn how to configure object storage solutions and lifecycle management in Simple Storage Service (S3), a web service offered by AWS, and migrate, back up, and replicate relational data in RDS. Find out how to leverage flexible network storage with Elastic File System (EFS), and use the new AWS Glue service to move and transform data. Plus, learn how Snowball can help you transfer truckloads of data in and out of the cloud.
- What is data management?
- AWS S3 basics
- S3 bucket creation
- S3 upload and logging
- S3 event notifications
- S3 data lifecycle configuration
- Working with Amazon Elastic Block Store volumes
- Creating and mounting an EFS
- Creating an AWS RDS instance
- RDS backup and recovery
- Moving data with AWS Database Migration Service
- Moving data with Data Pipeline and Glue