In Spark, data can be manipulated in many ways. In this video, learn both how data is divided, modified, and evaluated, and when those actions are performed.
- [Instructor] Earlier, we talked about…how Spark is a distributed system.…This means that if we want the workers…to work in parallel, Spark needs to break…the data into chunks or partitions.…A partition is a collection of rows…from your data frame that sits on…one machine in your cluster.…So a data frames partition is how the data is…physically distributed across the…cluster of machines during execution.…Now because you're working with a high level…API when using data frames, you don't…normally get involved with manipulating…the partitions manually.…
Just so you know, if you only have one partition,…Spark can't parallelize jobs even if…you have a cluster of machines available.…In the same way, if you have several partitions…but only one worker, Spark can't parallelize jobs…as there's only one resource that can do the computation.…Transformations are a core data structure in Spark,…and are immutable.…Immutable is just a fancy way of saying…they can't be changed once they've been created.…So the instructions that we use to modify the data frame…
- Benefits of the Apache Spark ecosystem
- Working with the DataFrame API
- Working with columns and rows
- Leveraging built-in Spark functions
- Creating your own functions in Spark
- Working with Resilient Distributed Datasets (RDDs)
Skill Level Intermediate
1. Introduction to Apache Spark
2. Technical Setup
3. Working with the DataFrame API
5. Resilient Distributed Datasets (RDDs)
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.Cancel
Take notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.