The DataFrame API is a common way of working with data in Spark. In this video, learn how it works.
- [Instructor] There are two main APIs…that we'll be looking at in this course,…DataFrames and resilient distributed datasets, or RDDs.…The DataFrames are the high level APIs…and the RDDs are the low level APIs.…DataFrames are easy to get started with…and cover a good chunk of what you'll need…to know on the job.…Once you are comfortable with DataFrames,…we'll look at RDDs.…Now when Spark was first open source,…Spark enabled distributed data processing using RDDs.…This provided a simple API for distributed data processing.…
So big data engineers who are familiar with MapReduce jobs…could now leverage the power of distributed processing…using general purpose programming languages,…such as Java, Python, and Scala.…Now the challenge was that if Apache Spark…wanted to attract a wider audience onto their platform,…including data analysts and data scientists,…then they were going to have to create something…that they would be familiar with.…What better thing than a DataFrame?…If there's one thing that data scientists…
- Benefits of the Apache Spark ecosystem
- Working with the DataFrame API
- Working with columns and rows
- Leveraging built-in Spark functions
- Creating your own functions in Spark
- Working with Resilient Distributed Datasets (RDDs)
Skill Level Intermediate
1. Introduction to Apache Spark
2. Technical Setup
3. Working with the DataFrame API
5. Resilient Distributed Datasets (RDDs)
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.Cancel
Take notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.