Join Mike Chapple for an in-depth discussion in this video Suspicious multiples, part of Cleaning Bad Data in R.
- [Instructor] Another common source of suspicion…in datasets is when you see unusual recurring multiples.…The most common example of this…is when all of the values in a dataset end…in several zeros.…This may be the result of rounding…or it may come from extrapolation.…For example earlier in this course we used this dataset…containing the number of acres of public land…in each state.…Did you notice anything suspicious about this dataset…when we first looked at it?…Well all of the values here end in three zeros,…and there's a good reason for that.…
I built this file using a government source document.…Let's take a look at that document.…The data in our file comes from the first…and third columns of this page.…Take a look at the third column.…It contains the total area of national forest system land…but it's showing the data in thousands of acres.…So Alabama is listed as 665,…which represents 665,000 acres.…Back here in the dataset we have the round number 665,000,…that's not the exact value of acres of public land…
Where possible, instructor Mike Chapple shows how to correct the issues using R, but the same principles can be applied to any statistical programing language.
- Missing data
- Duplicate rows and values
- Converting data
- Formatting data
- Working with tidy data
- Tidying data sets
- Dealing with suspicious data
Skill Level Beginner
1. Missing Data
2. Duplicated Data
3. Formatting Data
5. Tidy Data
6. Red Flags
What's next?1m 5s
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.Cancel
Take notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.