Start free trial Sign in

From the course: Mistakes to Avoid in Machine Learning

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Not standardizing your data

Not standardizing your data

From the course: Mistakes to Avoid in Machine Learning

Start my 1-month free trial

Not standardizing your data

“

- [Instructor] During the data preprocessing stage in machine learning, an important consideration is to scale your data. Neglecting to do this can have unforeseen consequences in your feature selection and modeling phases of the machine learning pipeline. Why do we need to scale features? Well, many machine learning techniques will incorrectly assign a higher weight to features of a higher magnitude. One example of this is KNN or K-nearest neighbors. This relies on Euclidean distance and its computations. Euclidean distance can be thought of as the straight line distance between two points. It's worth noting that tree-based algorithms do not require scaling. So let's see an example. First we'll import pandas as pd. Now we're going to import a sample dataset from scikit-learn datasets. Specifically we're going to load the breast cancer dataset. This returns an object which we'll want to convert into a Panda's data frame…

Contents