Join Jonathan Fernandes for an in-depth discussion in this video Solution, part of Apache PySpark by Example.
- [Instructor] So you can see that I've read…the csv file into my data frame rc.…So let's look at the questions.…What percentage of reported crimes resulted in an arrest?…Now if you look at the arrest column…we can see for the first five rows that we have…false and true.…So I want to confirm that I am capturing all of the ones…where the arrest is true…and I want to ensure that the only way that…the arrest true is represented is by lowercase.…So the way for me to do that is to do rc dot select…and I use the distinct function.…
So that's great because that means that…all of the arrests are either a true or a false…and there's no other category for arrest.…Now I want to confirm the actual data type for arrest…so I type rc dot print schema.…And I can see that arrest is a string type.…So in order for me to run that filter condition…it's rc dot filter.…I'm looking for the column arrest.…
And because it's a string it's double equals two…and then true in a single quote,…and I want to do a count of that.…Now because I'm looking for a percentage…
- Benefits of the Apache Spark ecosystem
- Working with the DataFrame API
- Working with columns and rows
- Leveraging built-in Spark functions
- Creating your own functions in Spark
- Working with Resilient Distributed Datasets (RDDs)
Skill Level Intermediate
1. Introduction to Apache Spark
2. Technical Setup
3. Working with the DataFrame API
5. Resilient Distributed Datasets (RDDs)
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.Cancel
Take notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.