From the course: Data Science on Google Cloud Platform: Building Data Pipelines

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

What is Apache Beam?

What is Apache Beam? - Google Cloud Tutorial

From the course: Data Science on Google Cloud Platform: Building Data Pipelines

Start my 1-month free trial

What is Apache Beam?

- [Instructor] Google Cloud Dataflow is built as an execution engine for Apache Beam. You define your pipeline logic in Beam and execute it on Dataflow. In this and following videos, I will elaborate on various Apache Beam concepts and capabilities. Apache Beam is a common pipeline definition model. You define various activities in the pipeline, including data addition, cleansing, transformation and persistence, using this common model. It is essentially used to define extract, transform, load, or ETL, tasks. This interface is simple and easy to understand and use. The main goal of Apache Beam is to separate the definition of the pipeline from its execution. You define the pipeline in a single unified programming model, but you can execute it on multiple platforms. It supports multiple programming SDKs to define your pipeline. This includes Java, Python and Go. You can execute your pipeline using multiple product options. The native runner, Flink, Spark, Cloud Dataflow, et cetera. You…

Contents