In this video, learn how to re-fit the model on the full training set and evaluate on the validation set.
- [Instructor] Now that we've done some hyper parameter tuning, and we have a good idea of what the best hyper parameter combinations are, let's evaluate these models on some unseen data in the validation set. Now the performance shouldn't deviate too much from the performance we saw with the cross validation, because the performance metrics there were also on unseen data. But this will give us the opportunity to look at a couple additional performance metrics beyond just accuracy. So I do want to mention at this point that typically you'll test different algorithms as well. In this case, we're only using random force classifier just to keep things simpler. But usually we would have many different candidate models to choose from. In this lesson, we're going to take those three best performing models from the last lesson and really dig in to understand how they're performing. You'll notice that we're importing a couple additional metrics that we did not import last time. Now lastly we're going to read in our data from the training and validation sets for this lesson, and then we'll read in our test set for the next lesson. So let's go ahead and run that cell. So here you'll see all the results we generated using grid search cv in the last section. So we're going to take the three best hyper parameter combinations and we're going to refit those models on the full training set. Why do we need to refit on the full training set? Remember that these models were originally fit only on 80% of the training data, because when we're doing five fold cross validation, in each loop we're only using 80% of the data for training so that we can hold out that remaining 20% for testing. Now we want to evaluate this on the validation set, so let's allow our models to learn from the full training set instead of limiting it to only 80%. So we're going to start with the best hyper parameter combination that we found in the last lesson, and that was with number of estimators equal to five, and max depth equal to 10. And then we'll store that as rf1, and then all we have to do is call rf1.fit and pass in our features and our labels. Now remember, we have to convert these labels (mumbles) column vector into an array by calling .values.ravel. Now, remember that these hyper parameter settings will guide the way that this random force classifier will fit to the full training set. Now we want to do this for our top three models. So we'll just copy that code down and our second best model had 100 estimators and also had max depth of 10. So let's assign that to rf2 and we'll change it down here for our .fit statement. And then our third best model had 100 estimators as well with max depth of none. Then we'll assign that to rf3 and call rf3.fit. Now once this cell is run, rf1, rf2, and rf3 will become fit models. So in other words, then we can take those and use it to make predictions on unseen data. So let's go ahead and run that. Now let's evaluate these fit models on the validation set. So the only examples these models have seen at this point are those in the training set. So this is the true test. It's the test of the models ability to generalize to unseen data. If they're over fit or under fit, they'll fail here. Now remember from previous lessons that we'll be using accuracy, precision, and recall to evaluate these models. To select the one that generalizes best to the validation set. So in order to evaluate these, we're going to use a for loop where we'll just cycle through our three models, rf1, rf2, and rf3, and then for each model, we'll call the .predict method, and then we just have to pass in our validation features, which is stored in val_features. So that will make predictions, and it will output an array of those features, and we'll just go ahead and store those as y_pred. So now that we have these predictions, now we want to generate some results metrics. So first let's call accuracy score and what you have to pass into accuracy score is first the actual labels. So that's val_labels. That's what we read it in and stored it as. And then it wants the predictions. So y_pred. So we'll close that out, and then we're also rounding this to three decimal places just to make it a little bit cleaner. So next we want to generate the precision. So we can just copy this down because it uses the same syntax, and we just change it from accuracy score to precision score. And then lastly we want to generate recall. So again we can do the same thing. Copy it right down, change that to recall_score and it'll store that as recall. And then all this last step is doing is it's just printing out the results. So we'll say for the model with a max depth of whatever that model's max depth is, and this number of estimators. Here is the accuracy, the precision, and the recall. So we can go ahead and run that now. Now, you can see that the model that performed best in cross validation actually didn't perform best in the validation set. Based on the hyper parameter of settings, this first model is actually the most simple. It has the least amount of estimators and it has a lower max depth. So it's possible that perhaps when we threw it the extra 20% of the training data, it just wasn't quite complex enough to learn the true patterns in the data, so it ended up under fitting. Again, we don't have quite enough information to say that concretely though. But we can see that the best model here is the second model. The accuracy is highest by a decent amount, the precision is much higher than the others, and the recall is second highest. Now, we're giving the edge to the second model here because the gain and precision is so large that it's worth the marginal loss and recall. But in many scenarios, it's much closer than that, and you'll have to use the context in which you're using this model to determine if precision or recall's more important to you. Now that we've evaluated all the models, and we selected the best model as the one with the max depth of 10 and 100 estimators, in the next lesson we're going to get one final unbiased view of how this model will perform on unseen data by evaluating it on the test set.
- What is machine learning (ML)?
- ML vs. deep learning vs. AI
- Handling common challenges in ML
- Plotting continuous features
- Continuous and categorical data cleaning
- Measuring success
- Overfitting and underfitting
- Tuning hyperparameters
- Evaluating a model