In this video, learn how to clean categorical features in whatever way necessary.
- [Instructor] Let's pick up right…where we left out in the last chapter…by reading in our dataset that has…all the continuous features cleaned.…And then we'll dive in to cleaning up…the categorical features in this lesson.…So you can go ahead and run that first cell,…and let's start by creating an indicator variable…for the cabin feature.…Now back in the exploring categorical features lesson,…we learned that unlike age, cabin is not missing at random.…We learned that two thirds of people aboard…that had cabins survived,…while only 30% of those aboard…that did not have cabins survived.…
So we're going to create this…really simple cabin indicator using the WHERE method…from NumPy that acts as an if statement.…So the condition we want to use is…whether cabin is null.…So I'll call titanic bracket cabin,…and then again will use this isnull method…which just returns a true or false based…on whether cabin is missing or not.…And so we'll say if cabin is missing,…we want it to return as zero.…
And if it's not missing, we'll return a one.…
Author
Released
5/10/2019- What is machine learning (ML)?
- ML vs. deep learning vs. AI
- Handling common challenges in ML
- Plotting continuous features
- Continuous and categorical data cleaning
- Measuring success
- Overfitting and underfitting
- Tuning hyperparameters
- Evaluating a model
Skill Level Beginner
Duration
Views
Related Courses
-
Deploying Scalable Machine Learning for Data Science
with Dan Sullivan1h 43m Intermediate
-
Introduction
-
Leveraging machine learning1m 57s
-
What you should know1m 6s
-
Using the exercise files1m 24s
-
-
1. Machine Learning Basics
-
Why Python?5m 49s
-
Common challenges6m 4s
-
2. Exploratory Data Analysis and Data Cleaning
-
Plotting continuous features7m 35s
-
Continuous data cleaning5m 44s
-
Categorical data cleaning4m 33s
-
3. Measuring Success
-
Why do we split up our data?5m 54s
-
-
4. Optimizing a Model
-
What is underfitting?2m 26s
-
What is overfitting?2m 47s
-
Finding the optimal tradeoff3m 16s
-
Hyperparameter tuning6m 22s
-
Regularization2m 31s
-
5. End-to-End Pipeline
-
Overview of the process1m 48s
-
Clean categorical features4m 18s
-
Tune hyperparameters6m 34s
-
-
Conclusion
-
Next steps1m 23s
-
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.
CancelTake notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.
Share this video
Embed this video
Video: Clean categorical features