From the course: NLP with Python for Machine Learning Essential Training

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Building a random forest model

Building a random forest model - Python Tutorial

From the course: NLP with Python for Machine Learning Essential Training

Start my 1-month free trial

Building a random forest model

- [Instructor] Now we're actually going to learn how to implement a random forest model in Python. In this lesson, we'll learn some of the basics about the random forest classifier in scikit-learn, and then we'll learn how to fit and evaluate it using cross-validation. First, we need to read in our data, create our new features, clean it, and then vectorize the data. This is the same as before, so if you need a refresher, feel free to review the exercise notebooks. The one thing that I will call out is that I made the decision to vectorize using the TfidfVectorizer. So as a refresher, that means a document term matrix where each cell is a weight of how important that word is, by measuring how frequently it occurs within that text message, relative to how frequently that word occurs across all other text messages. One other thing that I will call out as well, is that we're creating a data frame called x_features that does not include the label, and you'll learn why that label was kept…

Contents