From the course: Text Analytics and Predictions with Python Essential Training

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Building n-grams database

Building n-grams database - Python Tutorial

From the course: Text Analytics and Predictions with Python Essential Training

Start my 1-month free trial

Building n-grams database

- [Instructor] In this video, we will build the ngrams database with bigram generated from the code's description data set. Bigrams have a first word and a second word that occurs after that. For this example, let's use a SQLite in memory database. In real-world examples, we should use a high-performance, persistent database, possibly part of the client which executes the predictive text. We cleared a table called ngrams with fields FIRST, SECOND, and COUNTS. The FIRST column stores the first word in the bigram. The SECOND column stores the second word in the bigram. The COUNT represents the total number of times this first word, second word combination occurs in the entire corpus. We build our bigrams list using the ngram package in NLTK. We then insert data into the SQLite database. If the bigram exists in the database, we increment the counts. If not, we insert a new record. We use the UPSERT capability in SQLite for this purpose. Once the database is built, we sample the records…

Contents