From the course: Cloud Hadoop: Scaling Apache Spark
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Review batch architecture for ETL on AWS - Apache Spark Tutorial
From the course: Cloud Hadoop: Scaling Apache Spark
Review batch architecture for ETL on AWS
- [Instructor] The first Hadoop pipeline architecture we're going to examine is kind of a traditional one. This is using Batch extract, transform, and load. So it's not using streaming, it's not using Just-in-Time. However, it is leveraging some services and processes in the cloud. Now this one happens to be running on the Amazon Cloud and it's around augmenting healthcare data, cleaning healthcare data, and you can see from the associated link if you want to read the underlying use case in more detail, but let's look at the architecture. So, we start with a manifest file and this stored in S3 file storage and that's going to trigger compute or processing and this is run in the microservices or Lambda architecture. Now that's separate from the whole data world. It's interesting to note that Lambdas are being more and more used in these data pipelines because it's efficient. It's more efficient than using a virtual machine.…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
(Locked)
Sign up for Databricks Community Edition3m 29s
-
Add Hadoop libraries2m 33s
-
(Locked)
Databricks AWS Community Edition2m 22s
-
(Locked)
Load data into tables1m 51s
-
Hadoop and Spark cluster on AWS EMR7m 30s
-
(Locked)
Run Spark job on AWS EMR4m 40s
-
(Locked)
Review batch architecture for ETL on AWS2m 17s
-
(Locked)
-
-
-
-
-
-