From the course: Apache PySpark by Example

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Download a dataset

Download a dataset - Spark DataFrames Tutorial

From the course: Apache PySpark by Example

Start my 1-month free trial

Download a dataset

- [Instructor] We're going to be using the data from the City of Chicago's data portal. So select export right-click on CSV and select copy link address. That's the link address we need to download the data. So let's head over to our notebook to download the data. We'll need to upload the notebook to Google Colab so select upload, select choose file and select the download data notebook. Now that we've uploaded it to our virtual machine the first thing we want to do is to list our directory. We do this to make sure that we have the spark files on our virtual machine. So we can see here that you've got the spark files on your machine so we don't need to download them again. So let's set up our environment. I do that by running these cells here under set up environment. I can see my spark session. So now we got to download the data that we need. So let's use W + Get and let's paste the address that we copied. Now because the file we're going to be using is more than one and a half gigs,…

Contents