From the course: NLP with Python for Machine Learning Essential Training

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Model selection: Data prep

Model selection: Data prep - Python Tutorial

From the course: NLP with Python for Machine Learning Essential Training

Start my 1-month free trial

Model selection: Data prep

- [Instructor] Now we've gone through pretty much the entire machine learning process. We've read in raw text, cleaned that text, created and transformed features in feature engineering, we've fit a simple model and evaluated it on a holdout test set, we've tuned hyperparameters and evaluated each one using GridSearchCV, and now we're going to cap it all off by comparing our best performing models to select the very best model. But before we do that, I have to mention that we've been bending the rules just a little bit in regards to our vectorizers. Vectorizers are like models. They need to be fit on a training set and then stored in order to transform the test set. So when we say fit on the training set, in the context of a vectorizer, it basically just means it stores all of the words in the training set. Then when it transforms the test set, it will only create columns for the words that were in the training set. Any words that appear in the test set but not in the training set…

Contents