In this video, discover the steps to build an n-grams database.
[Instructor] We will now look at some advanced text … processing and transformation techniques in this module. … First, we look at N-Grams. What is N-Grams? … N-Grams is a sequence of n items in a sample of text. … The sequence can be any number. … Depending on N, it's called bigrams, trigrams, … four-grams, and et cetera. … For example, let's take a sentence: … "Dogs are favorite pets". … If we do bigrams conversion of these, … we will end up with three bigrams: … "dogs" and "are", "are" and "favorite", … "favorite" and "pets". … If we do trigrams we end up with: … "dogs", "are", "favorite" and "are", "favorite", "pets". … N-grams are used for building predictive text systems … that predict the next sequence of words. … Let's now look at the code. … The code for this chapter is available in the file, … called "04_xx advanced text processing". … The first part of this code we see here is a repeat … of all the processing we did in the previous chapter. … This takes raw text from Spark course description …
- Text mining today
- Reading text files using Python
- Cleansing text data
- Build n-grams databases for text predictions
- Preparing TF-IDF matrices for machine learning
- Scaling text processing for performance