From the course: Data Science on Google Cloud Platform: Building Data Pipelines
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
What is Apache Beam? - Google Cloud Tutorial
From the course: Data Science on Google Cloud Platform: Building Data Pipelines
What is Apache Beam?
- [Instructor] Google Cloud Dataflow is built as an execution engine for Apache Beam. You define your pipeline logic in Beam and execute it on Dataflow. In this and following videos, I will elaborate on various Apache Beam concepts and capabilities. Apache Beam is a common pipeline definition model. You define various activities in the pipeline, including data addition, cleansing, transformation and persistence, using this common model. It is essentially used to define extract, transform, load, or ETL, tasks. This interface is simple and easy to understand and use. The main goal of Apache Beam is to separate the definition of the pipeline from its execution. You define the pipeline in a single unified programming model, but you can execute it on multiple platforms. It supports multiple programming SDKs to define your pipeline. This includes Java, Python and Go. You can execute your pipeline using multiple product options. The native runner, Flink, Spark, Cloud Dataflow, et cetera. You…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.