Lean how to use AWS Glue to create a user-defined job that uses custom PySpark Apache Spark code to perform a simple join of data between a relational table in MySQL RDS and a CSV file in S3.
- [Narrator] For this demo we're going…to be joining two heterogeneous sources.…First, the employees table and then my SQL database.…You see that I have it selected here,…and you can see the employees that exist there.…Each one has an ID and a department ID.…To this we are going to join another file.…You will find it in the chapter six folder…of the exercise files.…It's called employeesatisfaction.csv.…Let's take a look.…This is an example of what might be…a survey given to employees.…Overall satisfaction with their job…and different categories.…
And there are a few comments.…It also has employee ID,…this will be the field we use to join…the two data sources using AWS Glue.…Now let's head to the AWS main console to start the job.…We'll go directly to Glue this time.…The first thing that we need to do…is make Glue aware of both sides of this join.…So we need to create a new crawler.…Click Add Crawler under the crawlers menu.…This one is going to be MySqlDemoCrawler.…
The datastore for this one is not an S3, but JDBC,…
Join AWS architect Brandon Rich and learn how to configure object storage solutions and lifecycle management in Simple Storage Service (S3), a web service offered by AWS, and migrate, back up, and replicate relational data in RDS. Find out how to leverage flexible network storage with Elastic File System (EFS), and use the new AWS Glue service to move and transform data. Plus, learn how Snowball can help you transfer truckloads of data in and out of the cloud.
- What is data management?
- AWS S3 basics
- S3 bucket creation
- S3 upload and logging
- S3 event notifications
- S3 data lifecycle configuration
- Working with Amazon Elastic Block Store volumes
- Creating and mounting an EFS
- Creating an AWS RDS instance
- RDS backup and recovery
- Moving data with AWS Database Migration Service
- Moving data with Data Pipeline and Glue
Skill Level Intermediate
2. Object Storage
3. File Systems
4. Database Services
5. Getting Data to AWS
6. Moving Data in AWS
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.Cancel
Take notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.