In this video, learn how to split a full dataset up into training, validation, and test sets.
- [Instructor] Now that we have our clean data set, … this lesson should be quite simple. … We're just going to split up our full data set … so that we have 60% of our examples in the training set, … 20% in our validation set, … and 20% in the test set. … So import the packages that we'll need … and read in our data. … Again, we're using this train test split method … that we're importing from scikit learn. … And this is going to make our job very easy. … I also want to call out that we're reading in this … titanic cleaned data set that we created in the last lesson … rather than the original titanic.csv. … So go ahead and run that so. … Now, as we saw previously, we'll start by splitting … our data into our features … by just dropping the survived column. … So that will just leave the fields that are used to … make our prediction. … And then, we'll grab our labels here by just grabbing … this survived column. … I'm just going to highlight again before we jump in … that this is going to be a two-step process. …
Author
Released
5/10/2019- What is machine learning (ML)?
- ML vs. deep learning vs. AI
- Handling common challenges in ML
- Plotting continuous features
- Continuous and categorical data cleaning
- Measuring success
- Overfitting and underfitting
- Tuning hyperparameters
- Evaluating a model
Skill Level Beginner
Duration
Views
Related Courses
-
Deploying Scalable Machine Learning for Data Science
with Dan Sullivan1h 43m Intermediate
-
Introduction
-
Leveraging machine learning1m 57s
-
What you should know1m 6s
-
Using the exercise files1m 24s
-
-
1. Machine Learning Basics
-
Why Python?5m 49s
-
Common challenges6m 4s
-
2. Exploratory Data Analysis and Data Cleaning
-
Plotting continuous features7m 35s
-
Continuous data cleaning5m 44s
-
Categorical data cleaning4m 33s
-
3. Measuring Success
-
Why do we split up our data?5m 54s
-
-
4. Optimizing a Model
-
What is underfitting?2m 26s
-
What is overfitting?2m 47s
-
Finding the optimal tradeoff3m 16s
-
Hyperparameter tuning6m 22s
-
Regularization2m 31s
-
5. End-to-End Pipeline
-
Overview of the process1m 48s
-
Clean categorical features4m 18s
-
Tune hyperparameters6m 34s
-
-
Conclusion
-
Next steps1m 23s
-
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.
CancelTake notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.
Share this video
Embed this video
Video: Split data into train/validation/test set