Join Lynn Langit for an in-depth discussion in this video Exploring Simple Storage Service (S3), part of Amazon Web Services Data Services.
- In this movie, we are going to take a look at AWS S3 or simple storage service. This is their core file system. So its file storage on their cloud using buckets, which are kind of analogous to drives locally and folders, which are the same thing, and then, of course, files. Its commodity priced and we'll look at pricing near the end of the section when we compare file storage option pricing, and it's used by almost all of the other data services. So this is probably the most used data product in the Amazon cloud services.
And nearly every customer that I work with, whether or not they deploy the majority of their solution on the Amazon cloud uses S3 for redundant or duplicate storage across vendors. So because it's so cheap and easy to use, it's just the core of really all of the data services on the Amazon cloud, so we are going to start with it. So here we are in the Amazon Web Services management console and under Storage and Content Delivery, we are going to click on S3 for Scalable Storage in the Cloud.
And inside of the console because we don't have any objects created, it gives us a little information about the object and it tells us that we can read, write and delete objects ranging from one byte to five terabytes each. This is also called a BLOB or a file. Blob for binary large objects. So the idea is that you can store files that are very large, such as sound files, movies, so on and so forth. So as it says, get started by creating a bucket and uploading a test object. Now folders are optional here and we'll work with buckets and files, and then we'll add folders as well.
Notice also, create, add and then manage, there is a lifecycle management capability built-in that allows you to move as it saves some of your files to archival storage, which is an even lower cost. But let's click on the blue button to create a bucket, and let's call it, First Bucket Sunday. Notice that I have different regions I can create this bucket in, and I'm recording in California, so I'll select Northern California. I have the option to set up logging, which would be more detailed capture of the activities.
We don't need that at this point, so I'm just going to click Create. So one of the aspects of working with S3 to get the most out of it is to understand the bucket properties. So you can see this is the metadata around the bucket, and if I expand the permissions you can see that being the owner I have these permissions associated. And if I wanted to, what I could do is add more permissions here, and I could say, to whom this was granted. So let's say it was authenticated users and they could in this case list the information inside of the bucket.
They could see what's there. And then I can go ahead and say Save. Now as with files that are stored locally, you can have more granular permissions assigned to particular objects inside of a bucket. We are just starting with the top level permissions. If you want to add a bucket level policy, what you get is a policy generator. Now this is a core concept in working with data objects in Amazon where you want to determine what security policies you need to set up and then you want to apply them via policies on the highest level of granularity.
So lock things down, if you will. So you can see that we've got sample bucket policies here, and it's going to show you just the examples here of the various bucket policies. So, for example, granting read-only permission to anonymous user, that's a common bucket policy. And they show you the text of that that you could just copy and paste in. So I'm actually going to go back over here, and I'm going to also click on the AWS policy generator because that's another tool that I've used in setting up security policies. So what this allows you to do is fill in this form and then the policy that you are needing will be generated, if one of the default ones doesn't fit for you.
So notice S3 bucket policy, and then inside of here I can say Allow or Deny, and then I can say, who it is, and I'm just going to put something a little bit fakey just for the purposes of demonstration. And then the AWS service, and then here is where I can put the actions. So I could put any of the actions that I would want. So let's say create bucket, delete bucket, just again just to show you an example. Then I want to associate this to an Amazon resource name, an ARN. So this is the syntax, ARN, Amazon S3, and then the bucket name and then the key name, and then you could add conditions as well.
Conditions could be associated to properties of the bucket. And the reason I'm showing you all this is that it's very, very common as you become more sophisticated in your storage of files up in the cloud to have a more granular set of permission. So this is a common area that I work with in my enterprise customers, and this is the path that I'll take where we'll start with the bucket policy examples and see if those work, and just kind of get everybody familiar with that. And then we'll use the policy generator, which I won't take the time to complete, but I think you can see how it works.
You would fill in the additional ARN information and then set the conditions and it would generate something that looks kind of like that. And then you would paste that back in there, and that would be a very specialty bucket policy or set of permissions. It's an intermediate technique when you are storing more and more files that I wanted to include in our discussion of S3 because we've got a lot of questions around how do you set the permissions. So I'm going to click Delete and just go with the permissions that I have and say Save. And then the next section here is Static Website Hosting.
Again, this surprises a lot of customers that I show that you can actually host a static website just inside of S3. So all you do is you enable website hosting and then you specify what is the index document. Of course, we don't have one in here yet, and that would be the start document. And then when you click on this URL, you'd have an automatic website. It's a capability that is incredibly simple and useful for static hosting. And I just wanted to make sure that people who are watching were aware of it. In addition to this, we have logging capabilities, so if we enable logging, we can say which bucket, so this bucket.
And then the destination for logs, which would be written out to a different bucket. This will log either all activities by default or anything you've to find via policies. So you might want to say, I want to have a log of whenever there is a delete of files in this bucket, I want to have a log of whenever there is a change to folder structure, so on and so forth, so all these activities can be easily logged. In addition we have events. So what events do is they allow notifications to be sent through another set of AWS services, so that you can then receive notifications about activities that are performed on the bucket.
As with logging, if something, for example, a file is deleted; if you enable events, then somebody could, for example, get an email notification on their phone or through an application when a file change was performed. Versioning allows you to store the different versions of objects. So notice, preserve, retrieve, restore every version, okay, additional level of protection. And you can use lifecycle rules, which comes up next to manage all versions as well as to put older versions into the archive.
And notice, once enabled versioning cannot be disabled, only suspended, so it's disabled by default. And it's a toggle, so you just click it to turn it on. Lifecycle allows you to move some of the objects from this warm storage into cold storage. So we'll come back to that when we take a look at cold storage. Cross-Region Replication replicates every future upload of every object to another bucket. This is often used in conjunction with versioning.
You are required to enable versioning on this bucket and the target bucket. So again, the idea here is redundancy, and as I work with enterprise customers and they move more and more of their critical information up into the cloud, these kind of processes are enabled more frequently as they move into production. Tags allow you to tag their resources, and you might remember from the introduction. So if I say, add more tags and I say name and I say Lynda, I've created a group, and I say Save.
Once I go look at that group, which I'll do in a minute, then you'll see that this bucket will show up. Really this is great in general, but I often use it with test projects because then you can easily see which resources are associated and remove them all when you're done with the project. It's just a good practice in general to tag your objects, so that you can find them and associate them to groups. And then the last property at the bucket level is Requester Pays. This is when a requester will pay for the data transfer and anonymous access is then disabled.
All right, so this is our bucket, again lots of things you can do in terms of the bucket properties, click Save here, say okay. And I encourage you to utilize the ones that are going to make sense for you. Now if I click Actions, I can create, delete, empty, refresh or go to properties. Now if I go inside of this bucket, I can upload a file directly or I can create a folder. And you might wonder, do I need to create folders? And really this is dependent on your particular set of circumstances. I find that most often I will want to create folders because there is different sets of functionality, and in the address it just makes sense to differentiate.
So I'm going to go ahead and create a folder called One. And then if I look at the properties of the folder, I have a similar set of properties. Let me click on this, but it's a subset. So the properties that are available at the folder level are the type of storage, so I either going to have Standard or Reduced Redundancy. And this affects pricing of S3 storage. What this means is, Standard is replicated across different locations, physical locations; Reduced Redundancy is within the particular location and it's cheaper; Server Side Encryption, none or AES-256.
Again, these impact the cost. Now I could create a folder in a folder if I wanted to, but the whole point of S3 is uploading files. So let's work with uploading some files. Let me go inside the folder and click Upload and then click Add Files. I'm going to click test one and say Open and click Set Details, and here I could use Reduced Redundancy or Server Side Encryption, Set Permissions. I could have individual permissions here. So here I'm going to start the upload.
And my upload is done, and now the actions that I can do on this particular file are, I have a direct link to it, here are my details, here are my permissions and here is associated metadata. And if I click on Actions, I can Open, Download, Create a Folder, Make Public, and that will make this link publicly accessible; or Cut, Copy or Delete. Now I'll show you how resource tagging works in the Amazon console.
You might remember that this bucket we tagged with the name Lynda, so let's go back to the main page. And if we click on Resource Groups Lynda, you'll see that because we tagged it with the name Lynda and that we matched this resource group, the group name is Lynda and the tag name is Lynda. Now I can go ahead and see the resources associated. So like I said, this is just a usability tip that I use as an architect when I'm creating samples and demos, I just create a set of tags, and that way I can see all the resources that are associated and easily delete them or make sure that they are turned off.
So the way that this works is, you just create a tag, and if you wanted to edit it, you associate it then to a group over here. And remember that everything is case-sensitive. They tell you that but it's easy to forget, and you won't have matches if you don't follow the case-sensitive. You can filter by regions and resource types, and it's just a really simple thing that will help you when you are starting working with your AWS data services.
- Why cloud tools matter
- Storage choices on AWS
- RDBMSs such as Core RDS, Aurora, and Oracle
- Working with semistructured data in NoSQL
- Connecting to data warehouses such as AWS Redshift and Snowflake
- Graph databases and AWS Machine Learning
- Working with Hadoop
- Common data scenarios and architectures