Learn about parallel operations using mapping functions.
- [Instructor] Let's take a look…at how we can apply mapping functions over RDDs…or Resilient Distributed Datasets.…So let's start the spark-repl,…and I'm running the spark-repl from the bin directory…of the spark package that I installed.…So you just need to navigate to whatever directory…you installed it and navigate to the bin subdirectory…and then start spark-shell.…Okay, I am going to work with a list of random numbers,…so I'm going to import a helper package,…import scala.util.Random,…and I'm going to create a value…called big range or bigRng for short,…and I'm going to call scala.util.Random,…and from that package, I'm going to get the shuffle method,…and I'm going to specify that I want a range of one to 100000,…and this'll generate random numbers for me.…
Great, so now I have a collection.…We'll see here it's a collection of random numbers,…so I'm just going to hit Ctrl+L to clear the screen.…Now what I want to do is map this into an RDD.…So I'll specify val bigPRng for parallel,…and I'll reference the SparkContext called parallelize,…
Dan also focuses on using Scala with Spark, a distributed processing platform. He first describes how to work with Resilient Distributed Datasets (RDDs)—a fundamental Spark data structure—and then explains how to use Scala with Spark DataFrames, a new class of data structure specially designed for analytic processing. He wraps up the course by providing a summary of advantages of using Scala for data science.
- The advantages of Scala for data science
- Scala data types
- Scala arrays, vectors, and ranges
- Parallel processing in Scala
- Mapping functions over parallel collections
- When and when not to use parallel collections
- Using SQL in Scala
- Scala and Spark RDDs
- Scala and Spark DataFrames
- Creating DataFrames
Skill Level Intermediate
Java for Data Scientists Essential Trainingwith Charles Kelly2h 43m Intermediate
1. Introduction to Scala
2. Parallel Processing in Scala
3. Using SQL in Scala
4. Scala and Spark RDDs
5. Scala and Spark DataFrames
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.Cancel
Take notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.