Data scientists often run into datasets that contain values that are well outside of the norm. These outliers can present special challenges for data analysis, and it is important to understand what they mean when present in your datasets. In this video,
- [Instructor] Data scientists often run into data sets…that contain values that are well outside of the norm.…These unusual values can present special challenges…for data analysis, and it's important to understand…what they mean when they're present in a data set.…Outliers are data points that lie far outside the norm,…and they may occur for two reasons.…First, outliers may indicate…some type of error in the data set.…Someone may have measured data…incorrectly in the first place,…incorrectly input into a system,…or performed a calculation improperly.…
For example, imagine that you're looking at a data set…consisting of temperatures from…weather stations in New York,…and you find that there is a single data point…recording a temperature of 212 degrees Fahrenheit.…This is clearly some type of error.…Perhaps the thermometer failed.…Maybe somebody wrote down the temperature incorrectly,…or it could be that the thermometer…was misplaced inside of an oven.…Whatever the cause, this is clearly an invalid data point…for outdoor temperatures.…
Author
Released
8/10/2017- What's tidy data?
- Using the tidyverse
- Working with tibbles
- Subsetting and filtering tibbles
- Importing data into R
- Making wide datasets long with gather()
- Making long datasets wide with spread()
- Converting data types in R
- Detecting outliers
- Manipulating strings in R with stringr
Skill Level Intermediate
Duration
Views
Related Courses
-
Code Clinic: R
with Mark Niemann-Ross3h 24m Intermediate -
Descriptive Healthcare Analytics in R
with Monika Wahi4h 15m Advanced -
Learning the R Tidyverse
with Martin Hadley3h 44m Intermediate
-
Introduction
-
Welcome59s
-
Using the exercise files1m 22s
-
-
1. Tidy Data
-
What is tidy data?3m 59s
-
Common data problems8m 21s
-
Using the tidyverse5m 7s
-
-
2. Working with Tibbles
-
Subsetting tibbles3m 16s
-
Filtering tibbles3m 57s
-
3. Importing Data into R
-
What are CSV files?3m 7s
-
Importing CSV files into R7m 40s
-
What are TSV files?1m 36s
-
Importing TSV files into R9m 55s
-
Importing Excel files into R8m 46s
-
-
4. Data Transformation
-
Wide vs. long datasets3m 43s
-
Converting data types in R8m 40s
-
-
5. Data Cleaning
-
Detecting outliers12m 54s
-
-
6. Data Wrangling Case Study: Coal Consumption
-
Reading in the coal dataset5m 12s
-
Segmenting the coal dataset8m 28s
-
7. Data Wrangling Case Study: Water Quality
-
Water quality data types4m 35s
-
Correcting data entry errors8m 46s
-
8. Data Wrangling Case Study: Social Security Disability Claims
-
Conclusion
-
Next steps52s
-
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.
CancelTake notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.
Share this video
Embed this video
Video: Detecting outliers