From the course: Machine Learning and AI Foundations: Value Estimations

Unlock the full course today

Join today to access over 22,700 courses taught by industry experts or purchase this course individually.

Training vs. testing data

Training vs. testing data

From the course: Machine Learning and AI Foundations: Value Estimations

Start my 1-month free trial

Training vs. testing data

- [Narrator] Let's look at train model part 2.py. When training a machine learning model, we always need to do two things with our data set. First shuffle the data so it's in a random order, and second split the data into a training data set and a test data set. Because shuffling data and splitting the data into train and test groups is such a common operation, psykit learn provides a built in function to do this in one line of code. This command will shuffle all of our data so it's in a random order, and then split it into two groups. The test size equals 0.3 parameter tells it we want to keep 70 percent of the data for training and pull out 30 percent of the data for testing. A 70/30 split is pretty typical. Splitting the data into testing and training groups allows us to keep the test data hidden from the machine learning system until we're ready to verify its accuracy. If we verify its accuracy with training data it had seen before it wouldn't be much of a test. By using data the…

Contents