From the course: Cleaning Bad Data in R

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Outliers use case

Outliers use case

From the course: Cleaning Bad Data in R

Start my 1-month free trial

Outliers use case

- [Instructor] Let's work together on a case study of identifying and handling outliers in a dataset. The dataset that we'll be using for this exercise contains the salaries of White House officials from 2011 through 2016. Here, you can see a small excerpt of that file. The full dataset contains over 2,700 rows. In the outliers_start_file, you have the code necessary to load this dataset. Let's go ahead and execute this code to load the tidyverse, set our working directory, and read in the data file. We now have a tibble containing the salary information, and I'd like to focus my effort particularly on salary data. I'm going to start by generating a boxplot of that variable. I'm going to use the boxplot function on the whitehouse dataset's Salary variable. When I run this, I see that some of this data really seems incorrect. There are some outliers that appear to have salary values of over one million dollars. Now there aren't any government officials making that kind of money. So…

Contents