From the course: Data Science on Google Cloud Platform: Building Data Pipelines
Cloud Dataproc - Google Cloud Tutorial
From the course: Data Science on Google Cloud Platform: Building Data Pipelines
Cloud Dataproc
- [Instructor] Cloud Dataproc is a managed Hadoop and Apache service running on GCP. This means it comes with HDFS, MapReduce, and Spark programming capabilities. Cloud Dataproc is managed. It provides automatic cluster setup, scale-up, and scale-down, and monitoring. There is minimal administrative work required to run Cloud Dataproc. It has built-in integrations with other GCP data processing products and data stores which makes building pipelines easy. It has a pay-as-you-go model so you only get billed when you actually execute code. The key advantage of Cloud Dataproc is that you can pull code that you have have returned and produced on Spark for an enterprise or AWS deployment to GCP without modifications. So you can easily move that code to different production environments.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.