From the course: Apache PySpark by Example

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Solution

Solution - Spark DataFrames Tutorial

From the course: Apache PySpark by Example

Start my 1-month free trial

Solution

- [Instructor] So you can see that I've read the csv file into my data frame rc. So let's look at the questions. What percentage of reported crimes resulted in an arrest? Now if you look at the arrest column we can see for the first five rows that we have false and true. So I want to confirm that I am capturing all of the ones where the arrest is true and I want to ensure that the only way that the arrest true is represented is by lowercase. So the way for me to do that is to do rc dot select and I use the distinct function. So that's great because that means that all of the arrests are either a true or a false and there's no other category for arrest. Now I want to confirm the actual data type for arrest so I type rc dot print schema. And I can see that arrest is a string type. So in order for me to run that filter condition it's rc dot filter. I'm looking for the column arrest. And because it's a string it's double equals two and then true in a single quote, and I want to do a count…

Contents