From the course: Advanced NLP with Python for Machine Learning

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Building a model on top of vectorized text

Building a model on top of vectorized text - Python Tutorial

From the course: Advanced NLP with Python for Machine Learning

Start my 1-month free trial

Building a model on top of vectorized text

- [Instructor] Now that we have our text messages cleaned and converted to a numeric representation, we're ready to implement a random forest model on top of this document term matrix. First, we're going to take care of all the steps that we covered previously. So we'll read in our data. We'll clean up our data. And then we'll use a TfidfVectorizer to convert our text messages to a numeric representation in the form of a document term matrix. One note I will make is that we're calling toarray on this X_tfidf object and then we wrap that in a Pandas DataFrame method. And that just converts our tifid output from a sparse matrix to a DataFrame. So let's take a look at that by calling X_features.head. And we'll run this cell. And notice, the column names start with zero. So you can see that there are 9,395 columns, just like we saw as the dimensionality of our sparse matrix. Now let's move onto the modeling.…

Contents