From the course: Mistakes to Avoid in Machine Learning
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Not standardizing your data
From the course: Mistakes to Avoid in Machine Learning
Not standardizing your data
- [Instructor] During the data preprocessing stage in machine learning, an important consideration is to scale your data. Neglecting to do this can have unforeseen consequences in your feature selection and modeling phases of the machine learning pipeline. Why do we need to scale features? Well, many machine learning techniques will incorrectly assign a higher weight to features of a higher magnitude. One example of this is KNN or K-nearest neighbors. This relies on Euclidean distance and its computations. Euclidean distance can be thought of as the straight line distance between two points. It's worth noting that tree-based algorithms do not require scaling. So let's see an example. First we'll import pandas as pd. Now we're going to import a sample dataset from scikit-learn datasets. Specifically we're going to load the breast cancer dataset. This returns an object which we'll want to convert into a Panda's data frame…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
Assuming data is good to go2m 2s
-
(Locked)
Neglecting to consult subject matter experts1m 48s
-
(Locked)
Overfitting your models3m 25s
-
(Locked)
Not standardizing your data2m 57s
-
(Locked)
Focusing on the wrong factors2m 11s
-
(Locked)
Data leakage2m 40s
-
(Locked)
Forgetting traditional statistics tools1m 57s
-
(Locked)
Assuming deployment is a breeze1m 47s
-
(Locked)
Assuming machine learning is the answer1m 35s
-
(Locked)
Developing in a silo2m 16s
-
(Locked)
Not treating for imbalanced sampling3m 29s
-
(Locked)
Interpreting your coefficients without properly treating for multicollinearity3m 19s
-
(Locked)
Evaluating by accuracy alone6m 8s
-
(Locked)
Giving overly technical presentations1m 56s
-
-