From the course: Cloud Hadoop: Scaling Apache Spark
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Use AWS EKS containers and data lake - Apache Spark Tutorial
From the course: Cloud Hadoop: Scaling Apache Spark
Use AWS EKS containers and data lake
- [Instructor] So additional concerns around optimizing Spark on the cloud depend on the vendor. So for AWS, you're going to use the monitoring cluster tools that include CloudWatch and some of the other ones that were mentioned in a previous movie. It's really important when you have production-sized workloads that you use VM optimization techniques. So for Amazon EC2, that's use of spot or batch to reduce cost. And, with the CSIRO team, they apply both these techniques and were able to substantially reduce the service cost. So it's really key when you're running these, in my case, genomic scale workloads, which are quite large, to use VM service optimizations. Now for GCP, you would use monitoring with Stackdriver, very sophisticated for data proc, if you're using a managed solution similarly there. And the optimization for Google Compute Engine VMs are preempt-able instances. So that's the Google version of spot or batch.…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
-
-
-
-
(Locked)
Scale Spark on the cloud by example5m 11s
-
(Locked)
Build a quick start with Databricks AWS6m 50s
-
(Locked)
Scale Spark cloud compute with VMs6m 16s
-
(Locked)
Optimize cloud Spark virtual machines6m 5s
-
(Locked)
Use AWS EKS containers and data lake7m 8s
-
(Locked)
Optimize Spark cloud data tiers on Kubernetes4m 17s
-
(Locked)
Build reproducible cloud infrastructure8m 37s
-
(Locked)
Scale on GCP Dataproc or on Terra.bio8m 34s
-
(Locked)
-