From the course: Python: Working with Predictive Analytics

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Divide the data into test and train

Divide the data into test and train - Python Tutorial

From the course: Python: Working with Predictive Analytics

Start my 1-month free trial

Divide the data into test and train

- [Instructor] When we look at our roadmap, we are still in the data preparation step. We need to divide the data into what's known as train and test. The training set contains a known output and the model learns on this data. We have the test dataset in order to test our model's prediction. Now all the data is numerical. Imagine our data now looks like separate wooden blocks stacked up as columns, like individual data frames. Then stacking multiple data frames together gives us a final data frame. In some cases, we may want to reduce the dimensions from the data to reduce process time and increase efficiency of the model, which is called dimensionality reduction. We will not talk about this concept in this course, but it's good to know this is something you can use, especially if your data has many features and comparatively few training samples. Now that we've put together the independent variables and assigned the response, which is a dependent variable, it's time to divide the…

Contents