In this video, discover the steps to build an n-grams database.
[Instructor] We will now look at some advanced text … processing and transformation techniques in this module. … First, we look at N-Grams. What is N-Grams? … N-Grams is a sequence of n items in a sample of text. … The sequence can be any number. … Depending on N, it's called bigrams, trigrams, … four-grams, and et cetera. … For example, let's take a sentence: … "Dogs are favorite pets". … If we do bigrams conversion of these, … we will end up with three bigrams: … "dogs" and "are", "are" and "favorite", … "favorite" and "pets". … If we do trigrams we end up with: … "dogs", "are", "favorite" and "are", "favorite", "pets". … N-grams are used for building predictive text systems … that predict the next sequence of words. … Let's now look at the code. … The code for this chapter is available in the file, … called "04_xx advanced text processing". … The first part of this code we see here is a repeat … of all the processing we did in the previous chapter. … This takes raw text from Spark course description …
- Text mining today
- Reading text files using Python
- Cleansing text data
- Build n-grams databases for text predictions
- Preparing TF-IDF matrices for machine learning
- Scaling text processing for performance
Skill Level Intermediate
Processing Text with R Essential Trainingwith Kumaran Ponnambalam55m 57s Intermediate
1. Text Mining
2. Reading Text
3. Text Cleansing and Extraction
4. Advanced Text Processing
5. Best Practices
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.Cancel
Take notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.