From the course: NLP with Python for Machine Learning Essential Training

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

N-gram vectorizing

N-gram vectorizing - Python Tutorial

From the course: NLP with Python for Machine Learning Essential Training

Start my 1-month free trial

N-gram vectorizing

- [Male Instructor] This lesson is going to look very similar to the last lesson as we're going to follow almost exactly the same code template. This time we're going to do it with n-grams instead of the count vectorizer. The first question, what are n-grams? The n-grams process creates a document-term matrix like we saw before. Now we still have one row per text message and we still have counts that occupy the individual cells but instead of the columns representing single terms like we saw in the last lesson, now they represent all combinations of adjacent words of length and in your text. As an example, let's use the string NLP is an interesting topic. Hopefully you agree. This table shows how that would break down. In n-grams if n equals two then that's called the bigram and it'll pull all combinations of two adjacent words in our string. In NLP is an interesting topic, it will pull out four tokens, NLP is, is an, an interesting, interesting topic. When n equals three that's…

Contents