From the course: Apache PySpark by Example

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Solution

Solution - Spark DataFrames Tutorial

From the course: Apache PySpark by Example

Start my 1-month free trial

Solution

- [Instructor] Now, just to speed things up a little bit, I'm going to drop the DataFrame rc into cache. And, since the cache command, or the cache function, is lazily evaluated, I'm going to use an action function, such as count, to get that DataFrame into cache as soon as possible. So, cache and rc.count. Now, remember that running the count function has got nothing to do with answering this question. So let's take a look at our DataFrame, so rc.show, and first five rows. Now, I have no idea what formats the flag non-criminal looks like, but the best place to look is probably the Primary Type column. Now, the first five rows don't provide any clues, so let's get all of the unique rows for that column. What I'm going to do first is to get a count of the unique rows, and then I'll know how many rows I should be showing. So, rc .select (col Primary Type distinct and do a count. So I know that there're going to be 35 different, unique rows. Now, I also don't know whether non-criminal is…

Contents