Learn how to define the preliminary steps to support an AWS Glue job that uses both S3 and RDS endpoints.
- [Instructor] Before we get started with AWS Glue, there are a few steps that we need to take. First, we need to create a role for the Glue service to use to interact with other resources in our account. We'll go to the Identity and Access Management section of the control panel. Here on the IAM Panel, go to Roles, and then Create Role. We have a selection down below of AWS services, so scroll down and choose Glue from the center column. Click Next: Permissions. We need to add policies to this role.
If you search glue and hit Enter, you'll see the AWSGlueServiceRole policy, which we can attach. In addition, we want to add some S3 privileges to this role. Glue is going to need to interact with S3, not only for logging and for storing jobs, but for any data that we wish to read and write from it. So I've typed s3 here in the filter, and then I'm going to choose AmazonS3FullAccess. Click Next: Review. For the role name, I'm going to use AWSGlueServiceRole, and it's important that you do the same.
The documentation for Glue says that AWS Glue-provided policies expect IAM service roles to begin with AWSGlueServiceRole, just like this. You can see the two policies that we have selected below, the Glue service role and the S3 full access role. Click Create role. Now that we've done this, we can leave the IAM console and head back to the main page. The last thing that we need to do is head to the EC2 section and create a new security group. Click on EC2, then go to Network & Security on the left-hand side, and choose Security Groups.
Click Create Security Group. What we're doing here is we are creating a rule that will attach to our remote resources, in this case the RDS Database, and then Glue as a service we'll assign to itself, to the resources that it provisions to access the same remote target. What this means is that the rule needs to be self-referencing. You may know that security groups allow traffic in not only from IP ranges, but also from other security groups. So what we'll do is we'll create a rule that references itself.
We'll call it GlueSelfReferencingRule. Since this rule must reference itself, first we need the security group ID of this rule, which means this is a two-step process. We create the rule without any inbound rules established at all. So hit Create, and then we'll filter our search to find this. This Group ID is what we want. Now while still selecting the GlueSelfReferencingRule, click Actions and Edit Inbound rules.
I'm going to select All TCP traffic. Under other circumstances, we could sculpt this rule to just the traffic that we need, but for purposes of this demo, this'll be fine. We're going to say Custom on Source and paste in the ID of this rule that we're currently editing. Now click Save. If you look at the inbound rules on this rule, you can now see that it's self-referencing, and that's it. Now it's on to prepare the data that we need for our Glue job.
Join AWS architect Brandon Rich and learn how to configure object storage solutions and lifecycle management in Simple Storage Service (S3), a web service offered by AWS, and migrate, back up, and replicate relational data in RDS. Find out how to leverage flexible network storage with Elastic File System (EFS), and use the new AWS Glue service to move and transform data. Plus, learn how Snowball can help you transfer truckloads of data in and out of the cloud.
- What is data management?
- AWS S3 basics
- S3 bucket creation
- S3 upload and logging
- S3 event notifications
- S3 data lifecycle configuration
- Working with Amazon Elastic Block Store volumes
- Creating and mounting an EFS
- Creating an AWS RDS instance
- RDS backup and recovery
- Moving data with AWS Database Migration Service
- Moving data with Data Pipeline and Glue