- Benefits of the Apache Spark ecosystem
- Working with the DataFrame API
- Working with columns and rows
- Leveraging built-in Spark functions
- Creating your own functions in Spark
- Working with Resilient Distributed Datasets (RDDs)
Skill Level Intermediate
- [Jonathan] Over the last couple of years Apache Spark has evolved into the big data platform of choice. It's used in startups all the way up to household names such as Amazon, eBay and TripAdvisor. There are a few really good reasons why it's become so popular. It's simple, it's fast and it supports a range of programming languages. If you know Python, then PySpark allows you to access the power of Apache Spark. Don't worry if you're a beginner. In my course on PySpark we'll be using real data from the city of Chicago as our primary data set.
We learn the basics of pulling in data, transforming it and joining it with other data. My aim is that by the end of this course you should be comfortable with using PySpark and ready to explore other areas of this technology. Hi, I'm Jonathan Fernandes and I work in big data and AI for a consultancy. I have used the concepts I am teaching in this course on a daily basis for several customers. I have created this course to get you learning and using Apache Spark as quickly as possible.
1. Introduction to Apache Spark
2. Technical Setup
3. Working with the DataFrame API
5. Resilient Distributed Datasets (RDDs)
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.Cancel
Take notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.