From the course: NLP with Python for Machine Learning Essential Training

Unlock the full course today

Join today to access over 22,500 courses taught by industry experts or purchase this course individually.

Inverse document frequency weighting

Inverse document frequency weighting - Python Tutorial

From the course: NLP with Python for Machine Learning Essential Training

Start my 1-month free trial

Inverse document frequency weighting

- [Instructor] So we've gone over the count vectorizor and n-grams. Now we're going to touch on the last vectorization method that we'll be covering, Term Frequency-Inverse Document Frequency, often referred to as TF-IDF. These aren't the only three methods of vectorizing, but they're the only three that we're going to cover here because they are the most popular. So what is TF-IDF? TF-IDF creates a document term matrix, where there's still one row per text message and the columns still represent single unique terms. But instead of the cells representing the count, the cells represent a weighting that's meant to identify how important a word is to an individual text message. This formula lays out how this weighting is determined. It may look a little bit intimidating, but it's actually quite simple. You start with this TF term, which is just the number of times that term I occurs in text message J, divided by the number of terms in text message J. It's just the percent of terms in…

Contents