From the course: Text Analytics and Predictions with Python Essential Training

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Preparing data for predictive text

Preparing data for predictive text - Python Tutorial

From the course: Text Analytics and Predictions with Python Essential Training

Start my 1-month free trial

Preparing data for predictive text

- [Instructor] For building predictive text we will again use the Course-Descriptions.txt file we have used in the previous exercises. The code for this is available in the file code_05_XX Predictive Text. We first load the data file into a raw text variable. Then we pre-process the dataset. First, we use nltk.word_tokenize method to convert descriptions into a list of tokens. We then remove special characters from the list. We remove punctuations from the token_list by using the compute library in nltk. Finally, we convert these tokens to lower case. We print a sample of this token_list and its count. Let's run the code now. From the sample list, we see that the words have been preprocessed as expected and we have a total of 579 tokens in this token_list.

Contents