From the course: Cloud Hadoop: Scaling Apache Spark
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Scale on GCP Dataproc or on Terra.bio - Apache Spark Tutorial
From the course: Cloud Hadoop: Scaling Apache Spark
Scale on GCP Dataproc or on Terra.bio
- [Instructor] In addition to working in Amazon, I do a lot of work on GCP and just wanted to take a minute here and show you that the data proc managed Hadoop Spark importation is conceptually very similar to Amazon. In some ways I find it to be more performant. So I wanted to show it here. So you just create a cluster very similarly to what you do in Amazon. Now if you've got a new google account there is a limit of eight CPU's, so you're going to want to set this to a smaller size because it's a one master and two workers and you'll run out of CPU's by default in a free trial account so know you have only eight here. Cause' four here and four here. Also you're going to want to enable the components because that's going to give you your Jupiter Notebook. So you're going to turn that on, and then you're going to want to select the components and get Anaconda, which is required on GCP and Jupiter, if you want to use…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
-
-
-
-
(Locked)
Scale Spark on the cloud by example5m 11s
-
(Locked)
Build a quick start with Databricks AWS6m 50s
-
(Locked)
Scale Spark cloud compute with VMs6m 16s
-
(Locked)
Optimize cloud Spark virtual machines6m 5s
-
(Locked)
Use AWS EKS containers and data lake7m 8s
-
(Locked)
Optimize Spark cloud data tiers on Kubernetes4m 17s
-
(Locked)
Build reproducible cloud infrastructure8m 37s
-
(Locked)
Scale on GCP Dataproc or on Terra.bio8m 34s
-
(Locked)
-