Working with real data can be valuable to learn how to work with sizable information that may have formatting issues an other common problems. In this video, learn how to add a CSV dataset into PySpark.
- [Instructor] We're going to be using the data…from the City of Chicago's data portal.…So select export…right-click on CSV and select copy link address.…That's the link address we need to download the data.…So let's head over to our notebook to download the data.…We'll need to upload the notebook to Google Colab…so select upload, select choose file…and select the download data notebook.…
Now that we've uploaded it to our virtual machine…the first thing we want to do is to list our directory.…We do this to make sure that we have the spark files…on our virtual machine.…So we can see here that you've got the spark files…on your machine so we don't need to download them again.…So let's set up our environment.…I do that by running these cells here…under set up environment.…I can see my spark session.…So now we got to download the data that we need.…
So let's use W + Get…and let's paste the address that we copied.…Now because the file we're going to be using…is more than one and a half gigs,…I like to see the download progress.…
- Benefits of the Apache Spark ecosystem
- Working with the DataFrame API
- Working with columns and rows
- Leveraging built-in Spark functions
- Creating your own functions in Spark
- Working with Resilient Distributed Datasets (RDDs)
Skill Level Intermediate
1. Introduction to Apache Spark
2. Technical Setup
3. Working with the DataFrame API
5. Resilient Distributed Datasets (RDDs)
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.Cancel
Take notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.