From the course: Advanced NLP with Python for Machine Learning

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Vectorize text using TF-IDF

Vectorize text using TF-IDF - Python Tutorial

From the course: Advanced NLP with Python for Machine Learning

Start my 1-month free trial

Vectorize text using TF-IDF

- [Instructor] Now that we've covered how to read in our text data and clean that text, now we'll learn how to convert that text into a numeric representation to be passed into a machine learning model. So what is term frequency-inverse document frequency or TF-IDF for short? Well, TF-IDF creates a document-term matrix where there's one row per document or example, and one column per word in the corpus. And each cell in that document-term matrix contains a weighting intended to reflect how important a given word is to the document within the context of its frequency in the larger corpus. So in our problem, that means that there's still one row per text message just like we have in our original data. But now instead of one column for the text message, we'll have one column per unique term in the entire dataset. And the individual cells will represent a weighting meant to identify how important a word is to an individual…

Contents