Join Mike Chapple for an in-depth discussion in this video Outliers use case, part of Cleaning Bad Data in R.
- [Instructor] Let's work together on a case study…of identifying and handling outliers in a dataset.…The dataset that we'll be using…for this exercise contains the salaries…of White House officials from 2011 through 2016.…Here, you can see a small excerpt of that file.…The full dataset contains over 2,700 rows.…In the outliers_start_file, you have the code…necessary to load this dataset.…Let's go ahead and execute this code to load the tidyverse,…set our working directory, and read in the data file.…
We now have a tibble containing the salary information,…and I'd like to focus my effort particularly on salary data.…I'm going to start by generating a boxplot of that variable.…I'm going to use the boxplot function…on the whitehouse dataset's Salary variable.…When I run this, I see that some of…this data really seems incorrect.…There are some outliers that appear…to have salary values of over one million dollars.…Now there aren't any government…officials making that kind of money.…
So let's dig into these data points in greater detail.…
Where possible, instructor Mike Chapple shows how to correct the issues using R, but the same principles can be applied to any statistical programing language.
- Missing data
- Duplicate rows and values
- Converting data
- Formatting data
- Working with tidy data
- Tidying data sets
- Dealing with suspicious data
Skill Level Beginner
1. Missing Data
2. Duplicated Data
3. Formatting Data
5. Tidy Data
6. Red Flags
What's next?1m 5s
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.Cancel
Take notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.