From the course: Cloud Hadoop: Scaling Apache Spark

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Use AWS EKS containers and data lake

Use AWS EKS containers and data lake - Apache Spark Tutorial

From the course: Cloud Hadoop: Scaling Apache Spark

Start my 1-month free trial

Use AWS EKS containers and data lake

- [Instructor] So additional concerns around optimizing Spark on the cloud depend on the vendor. So for AWS, you're going to use the monitoring cluster tools that include CloudWatch and some of the other ones that were mentioned in a previous movie. It's really important when you have production-sized workloads that you use VM optimization techniques. So for Amazon EC2, that's use of spot or batch to reduce cost. And, with the CSIRO team, they apply both these techniques and were able to substantially reduce the service cost. So it's really key when you're running these, in my case, genomic scale workloads, which are quite large, to use VM service optimizations. Now for GCP, you would use monitoring with Stackdriver, very sophisticated for data proc, if you're using a managed solution similarly there. And the optimization for Google Compute Engine VMs are preempt-able instances. So that's the Google version of spot or batch.…

Contents