Join Mike Chapple for an in-depth discussion in this video Screening for outliers, part of Cleaning Bad Data in R.
- [Instructor] Data scientists often run into datasets…that contain values that well outside of the norm.…These outliers can present special challenges…for data analysis and it's important to understand…what they mean when they're present in your datasets.…Outliers are data points that lie far outside the norm,…and they may occur in two cases.…First, outliers may indicate some type…of error in the dataset.…Someone may have measured the data incorrectly…in the first place, incorrectly input it into a system,…or performed a calculation improperly.…
For example, imagine that you're looking at a dataset…consisting of temperatures from weather stations in New York…and find that there is a single data point…recording a temperature of 212 degrees Fahrenheit.…This is clearly some type of error.…Perhaps the thermometer failed,…maybe somebody wrote down the temperature incorrectly,…or it could be that the thermometer was misplaced inside…of an oven.…Whatever the cause, this is clearly an invalid data point…for outdoor temperatures.…
Where possible, instructor Mike Chapple shows how to correct the issues using R, but the same principles can be applied to any statistical programing language.
- Missing data
- Duplicate rows and values
- Converting data
- Formatting data
- Working with tidy data
- Tidying data sets
- Dealing with suspicious data
Skill Level Beginner
1. Missing Data
2. Duplicated Data
3. Formatting Data
5. Tidy Data
6. Red Flags
What's next?1m 5s
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.Cancel
Take notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.