Learn basic aggregation and filtering techniques available in DataFrames.
- [Instructor] One of the nice things…about working data frames, is that we can use SQL…to perform grouping and filtering operations.…I'm starting in a Spark session that I left off…in our last video.…So I'm just going to clear the screen,…and I'm going to show a data frame called df_emps.show…which I created in a pervious video.…So this is a table of about 1,000 employees.…Now I'd like to be able to use SQL with this,…so I'm going to perform an operation…that allows me to essentially create a temporary view.…
And to do that I'm going to specify df_emps,…which is the name of the data frame;…create OrReplaceTempView,…and I'm going to call that temporary view employees.…So now what I've done is essentially told Spark…to create a data structure call it employees,…and allow me to use SQL on that .…So let's clear the screen.…Now let's define a new data frame…with a select statement.…
So I'll create val, let's call it sqldf_emps.…And I'm going to simply call it spark.sql,…and now I'm going to pass in a select statement.…
Dan also focuses on using Scala with Spark, a distributed processing platform. He first describes how to work with Resilient Distributed Datasets (RDDs)—a fundamental Spark data structure—and then explains how to use Scala with Spark DataFrames, a new class of data structure specially designed for analytic processing. He wraps up the course by providing a summary of advantages of using Scala for data science.
- The advantages of Scala for data science
- Scala data types
- Scala arrays, vectors, and ranges
- Parallel processing in Scala
- Mapping functions over parallel collections
- When and when not to use parallel collections
- Using SQL in Scala
- Scala and Spark RDDs
- Scala and Spark DataFrames
- Creating DataFrames
Skill Level Intermediate
Java for Data Scientists Essential Trainingwith Charles Kelly2h 43m Intermediate
1. Introduction to Scala
2. Parallel Processing in Scala
3. Using SQL in Scala
4. Scala and Spark RDDs
5. Scala and Spark DataFrames
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.Cancel
Take notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.