Discuss how Spark executes programs with special emphasis on data flows.
- [Instructor] Let's take a deeper look at how Spark works.…It is important for you to know this…since your code influences how Spark will work,…and you need to know how to avoid pitfalls with it.…Let us start with a simple Spark cluster,…with one driver node and two worker nodes.…Let us write a simple program…that acquires data from a database,…and does some transformations and actions,…and moves data cells back to the destination DB.…
First, you read data from the source database…into a results set.…This will create a data structure within the driver node.…Then you create an RDD from this data.…This will create two partitions of the RDD,…one in each of the worker nodes.…Now, you execute a simple transformation, like a map.…This will move the transformation function code…to the worker node and execute it locally…inside the worker.…
The code will act on the source RDD,…and create the result in a new RDD.…There is no data movement between any of the partitions.…Next, you execute under the transformation,…like a sortByKey.…
- What is data engineering?
- Spark and Kafka for data engineering
- Moving data with Kafka and Kafka Connect
- Kafka integration with Apache Spark
- How Spark works
- Optimizing for lazy evaluation
- Complex accumulators
Skill Level Advanced
1. Data Engineering Overview
2. Moving Data with Kafka
3. Spark High-Performance Processing
4. Use Case Project
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.Cancel
Take notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.