From the course: Machine Learning with Scikit-Learn

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

How to format data for scikit-learn

How to format data for scikit-learn - scikit-learn Tutorial

From the course: Machine Learning with Scikit-Learn

Start my 1-month free trial

How to format data for scikit-learn

- Scikit-learn is a great library for creating machine learning models from data. Before you fit a model using scikit-learn, your data has to be in a recognizable format. Scikit-learn works well with numeric data that's stored in numpy arrays. Additionally, you can convert your data from objects like pandas dataframes to numpy arrays. In this video, I'll show you how you can make your data a more acceptable input for scikit-learning. The first thing you have to understand is what scikit-learn expects for features matrices and target vectors. In scikit-learn, a features matrix is a two dimensional grid of data where rows represent samples and columns represent features. A target vector is usually one dimensional and in the case of supervised learning, what you want to predict from the data. Let's now see an example of this. The image is a pandas dataframe of the first five rows of the iris dataset. A single flower…

Contents