Join Mike Chapple for an in-depth discussion in this video Suspicious values, part of Cleaning Bad Data in R.
- [Instructor] As you perform data cleaning,…there are some special suspicious values…that you should watch out for.…The presence of these values doesn't necessary mean…that your data is incorrect,…but if you see them in many places,…you should view them with suspicion.…I'm going to review a set of common suspicious values…developed by the data science community.…Many of these come from the Quartz guide to bad data,…which has an excellent exploration of data cleaning issues.…The first type of suspicious value stems from the way…that computers store data.…
You probably know that computers store data in binary form,…using a sequence of ones and zeros.…Each digit in the binary number is called a bit.…When you create a numeric variable,…you allocate a defined number of bits to store that value.…The number of bits that you allocate limits…the largest number that you can store in that variable.…For example, imagine that we have a two-bit variable.…That allows us to have two digits,…either one of which may be one or zero.…
Author
Released
8/22/2018Where possible, instructor Mike Chapple shows how to correct the issues using R, but the same principles can be applied to any statistical programing language.
- Missing data
- Duplicate rows and values
- Converting data
- Formatting data
- Working with tidy data
- Tidying data sets
- Dealing with suspicious data
Skill Level Beginner
Duration
Views
Related Courses
-
Data Wrangling in R
with Mike Chapple4h 12m Intermediate
-
Introduction
-
Data is messy1m 10s
-
What you need to know1m 9s
-
-
1. Missing Data
-
Types of missing data3m 38s
-
Missing values11m 25s
-
Missing rows5m 58s
-
-
2. Duplicated Data
-
Duplicated rows and values4m 50s
-
Aggregations in the data set3m 42s
-
-
3. Formatting Data
-
Converting dates5m 54s
-
Unit conversions3m 50s
-
Numbers stored as text3m 32s
-
Inconsistent spellings6m 51s
-
-
4. Outliers
-
Screening for outliers4m 53s
-
Handling outliers1m 58s
-
Outliers use case3m 34s
-
Outliers in subgroups3m 33s
-
Detecting illogical values3m 14s
-
-
5. Tidy Data
-
What is tidy data?3m 59s
-
Common data problems7m 57s
-
Wide vs. long data sets3m 23s
-
Making wide data sets long4m 37s
-
Making long data sets wide3m 41s
-
-
6. Red Flags
-
Suspicious values4m 49s
-
Suspicious multiples2m 25s
-
-
Conclusion
-
What's next?1m 5s
-
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.
CancelTake notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.
Share this video
Embed this video
Video: Suspicious values