Learn how to perform join operations on DataFrames.
- [Instructor] Now another common operation…is joining tables.…Now with Spark SQL we can join DataFrames.…So I'm going to pick up where I left off…in the previous lesson with my Scala REPL active here.…I'm going to just clear the screen.…And just as a refresher I'm going to show the contents…of a DataFrame called emps.…And there's the first 20 rows of emps.…I also have another DataFrame…that deals with countries and regions.…
And that's called df_cr.…And let's show the contents of those.…Now you'll notice both the employees table…and the country region table have a column called region ID.…That means I can join these two DataFrames.…Joining is really simple in Spark SQL.…Let's clear the screen.…And let's take a quick look at how to do that.…Let's create a new DataFrame.…We'll it a DataFrame, we'll simply call it joined.…
And it's going to be a join of the DataFrame…called emps for df emps.…And I'm going to apply the join operation to it.…And I'm going to tell Spark to join emps…to the DataFrame called cr for country regions.…
Dan also focuses on using Scala with Spark, a distributed processing platform. He first describes how to work with Resilient Distributed Datasets (RDDs)—a fundamental Spark data structure—and then explains how to use Scala with Spark DataFrames, a new class of data structure specially designed for analytic processing. He wraps up the course by providing a summary of advantages of using Scala for data science.
- The advantages of Scala for data science
- Scala data types
- Scala arrays, vectors, and ranges
- Parallel processing in Scala
- Mapping functions over parallel collections
- When and when not to use parallel collections
- Using SQL in Scala
- Scala and Spark RDDs
- Scala and Spark DataFrames
- Creating DataFrames
Skill Level Intermediate
Java for Data Scientists Essential Trainingwith Charles Kelly2h 43m Intermediate
1. Introduction to Scala
2. Parallel Processing in Scala
3. Using SQL in Scala
4. Scala and Spark RDDs
5. Scala and Spark DataFrames
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.Cancel
Take notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.