Learn about the fundamentals of Resilient Distributed Dataset, a core component of what makes Spark so fast.
- [Instructor] First, we should understand…how Spark actually provides you with the ways…of working with data in it's data interfaces.…Starting with the RDD, or the Resilient Distributed Dataset,…this is the first, and now lowest, level API…for working with data in Spark.…RDD's are what makes Spark so fast and can provide…data lineage across processes as they're completed.…You can think of an RDD as a container…that allows you to work with data objects.…These objects can be of varying types…and spread across many machines in the cluster.…While it's important to know about RDD's,…they're really only going to be useful…as you get into the more advanced applications of Spark.…
That brings us to the DataFrame.…If you're familiar with Python,…this is analogous to Pandas and…they're similar to DataFrames in R.…If you're a sequel person like me,…you can think of DataFrames as tables of data…that allow you to query it.…These DataFrames are based on RDD's except…they only contain rows,…whereas RDD's contain different types of objects.…
- Understanding Spark
- Reviewing Spark components
- Where Spark shines
- Understanding data interfaces
- Working with text files
- Loading CSV data into DataFrames
- Using Spark SQL to analyze data
- Running machine learning algorithms using MLib
- Querying streaming data
- Connecting BI tools to Spark
Skill Level Intermediate
1. Introducing Apache Spark
2. Analyzing Data in Spark
3. Using Spark SQL to Analyze Data
4. Running Machine Learning Algorithms Using MLlib
5. Real-Time Data Analysis with Spark Streaming
6. Connecting BI Tools to Spark
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.Cancel
Take notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.