Understand Spark dependencies and how to optimize them for better performance.
- [Instructor] In this lesson,…let's focus on Spark dependencies…and how to optimize your code…to work with and work around them.…So what is Spark dependency?…Spark dependency happens…when a transformation is executed…and one RDD is created from another RDD.…The question is,…does the transformation result in a shuffle?…Remember the data shuffle…we discussed in the previous lecture,…where data needs to be moved between Spark nodes…to execute the transformation?…So if the transformation creates a shuffle,…it's called a wide dependency.…
If the transformation does not create a shuffle,…it's called narrow dependency.…Let's go back to the diagram we reviewed…in the previous lecture on how Spark works.…In the worker node,…we see that RDD 12 gets created only from RDD 11,…so there is no data flow between worker nodes 1 and 2…to execute the transformation.…This is called narrow dependency.…On the other hand,…to create RDD 13,…we need data from both the worker nodes.…
This is called wide dependency.…Remember that wide dependencies…
- What is data engineering?
- Spark and Kafka for data engineering
- Moving data with Kafka and Kafka Connect
- Kafka integration with Apache Spark
- How Spark works
- Optimizing for lazy evaluation
- Complex accumulators
Skill Level Advanced
Big Data Foundations: Program Managementwith Alan Simon1h 11m Intermediate
1. Data Engineering Overview
2. Moving Data with Kafka
3. Spark High-Performance Processing
4. Use Case Project
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.Cancel
Take notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.