Where possible, instructor Mike Chapple shows how to correct the issues using R, but the same principles can be applied to any statistical programing language.
- Missing data
- Duplicate rows and values
- Converting data
- Formatting data
- Working with tidy data
- Tidying data sets
- Dealing with suspicious data
Skill Level Beginner
- [Mike] As any data scientist will tell you, the vast majority of the work involved in data analysis lies in getting the data into the right form. That's a field called data wrangling and it's what we'll cover in this course. A recent New York Times article cited the consensus that data scientists, according to interviews and expert estimates, spend from 50% to 80% of their time mired in this more mundane labor of collecting and preparing unruly digital data.
And that's before it can be explored for useful nuggets. In this course, I'll explain how you can use R to perform data wrangling using concepts drawn from the field of tidy data. We'll use a set of tools known as the tidyverse, that allow you to import data from a wide variety of sources, transform it to a standardized format, and clean it before performing your analysis. Hi, I'm Mike Chapple and I'd like to welcome you to this course on Cleaning Bad Data in R.
Get ready, this is going to be a fun journey that will help you sharpen your data wrangling skills. All right, let's get rolling!