In this video, learn how to clean up categorical features by filling missing values, creating new features, encoding, etc.
- [Instructor] In this final lesson … of the EDA and data cleaning chapter, … we'll take what we learned through our EDA … on categorical variables, … and we'll apply it to clean up our dataset. … Part of this is going to be repeating … what we did in the last lesson, … but we'll remove all of the exploratory work … so you can see a clear, … direct way of how this data should be cleaned. … So start by importing the packages that we need, … reading in our data, and then we're going to drop the name … and the ticket features. … We already talked through why we're dropping name. … We're going to drop ticket as well, … since it's essentially a randomly assigned number. … We already captured ticket class with the P class feature. … So in order to drop those two features, … we'll just call this .drop method, pass in a list … of those two features, tell it we want to drop the columns, … and we want to alter titanic in place. … So go ahead and run that. … In the last section, … we learned that missing values for Cabin weren't missing …
Author
Released
5/10/2019- What is machine learning (ML)?
- ML vs. deep learning vs. AI
- Handling common challenges in ML
- Plotting continuous features
- Continuous and categorical data cleaning
- Measuring success
- Overfitting and underfitting
- Tuning hyperparameters
- Evaluating a model
Skill Level Beginner
Duration
Views
Related Courses
-
Deploying Scalable Machine Learning for Data Science
with Dan Sullivan1h 43m Intermediate
-
Introduction
-
Leveraging machine learning1m 57s
-
What you should know1m 6s
-
Using the exercise files1m 24s
-
-
1. Machine Learning Basics
-
Why Python?5m 49s
-
Common challenges6m 4s
-
2. Exploratory Data Analysis and Data Cleaning
-
Plotting continuous features7m 35s
-
Continuous data cleaning5m 44s
-
Categorical data cleaning4m 33s
-
3. Measuring Success
-
Why do we split up our data?5m 54s
-
-
4. Optimizing a Model
-
What is underfitting?2m 26s
-
What is overfitting?2m 47s
-
Finding the optimal tradeoff3m 16s
-
Hyperparameter tuning6m 22s
-
Regularization2m 31s
-
5. End-to-End Pipeline
-
Overview of the process1m 48s
-
Clean categorical features4m 18s
-
Tune hyperparameters6m 34s
-
-
Conclusion
-
Next steps1m 23s
-
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.
CancelTake notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.
Share this video
Embed this video
Video: Categorical data cleaning