From the course: Advanced NLP with Python for Machine Learning
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Vectorize text using TF-IDF - Python Tutorial
From the course: Advanced NLP with Python for Machine Learning
Vectorize text using TF-IDF
- [Instructor] Now that we've covered how to read in our text data and clean that text, now we'll learn how to convert that text into a numeric representation to be passed into a machine learning model. So what is term frequency-inverse document frequency or TF-IDF for short? Well, TF-IDF creates a document-term matrix where there's one row per document or example, and one column per word in the corpus. And each cell in that document-term matrix contains a weighting intended to reflect how important a given word is to the document within the context of its frequency in the larger corpus. So in our problem, that means that there's still one row per text message just like we have in our original data. But now instead of one column for the text message, we'll have one column per unique term in the entire dataset. And the individual cells will represent a weighting meant to identify how important a word is to an individual…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.