From the course: Apache PySpark by Example

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Working with RDDs

Working with RDDs - Spark DataFrames Tutorial

From the course: Apache PySpark by Example

Start my 1-month free trial

Working with RDDs

- [Instructor] This time we don't need the reported crimes dataset, but the police station's dataset. I've provided the commands to get this dataset below. Let's do our normal checks. I've got the Spark software so I'm good now let me set up the environment for this notebook. And I'm going to store the police station's dataset in the csv file police-station.csv. Now I'm actually going to be using the SparkContext sc to open that csv file. So let's say se. textFile. And it's called police-stations.csv. And this is going to be an rdd so let's call it ps for police station rdd equals se.textFile. And let's view the first row of our rdd, psrdd .first. And we can see that the first row is all of the column names so this is the header for our rdd. So let's call it that, so ps police station header equals psrdd.first. Now if you want to grab the rest of the rdd we can do so by saying psrdd filter and we use a lamba function so lambda line where the line is not equal to the header. So ps…

Contents