Where possible, instructor Mike Chapple shows how to correct the issues using R, but the same principles can be applied to any statistical programing language.
- Missing data
- Duplicate rows and values
- Converting data
- Formatting data
- Working with tidy data
- Tidying data sets
- Dealing with suspicious data
Skill Level Beginner
- [Mike] As any data scientist will tell you, the vast majority of the work involved in data analysis lies in getting the data into the right form. That's a field called data wrangling and it's what we'll cover in this course. A recent New York Times article cited the consensus that data scientists, according to interviews and expert estimates, spend from 50% to 80% of their time mired in this more mundane labor of collecting and preparing unruly digital data.
And that's before it can be explored for useful nuggets. In this course, I'll explain how you can use R to perform data wrangling using concepts drawn from the field of tidy data. We'll use a set of tools known as the tidyverse, that allow you to import data from a wide variety of sources, transform it to a standardized format, and clean it before performing your analysis. Hi, I'm Mike Chapple and I'd like to welcome you to this course on Cleaning Bad Data in R.
Get ready, this is going to be a fun journey that will help you sharpen your data wrangling skills. All right, let's get rolling!
1. Missing Data
2. Duplicated Data
3. Formatting Data
5. Tidy Data
6. Red Flags
What's next?1m 5s
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.Cancel
Take notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.