From the course: Text Analytics and Predictions with Python Essential Training
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Preparing data for predictive text - Python Tutorial
From the course: Text Analytics and Predictions with Python Essential Training
Preparing data for predictive text
- [Instructor] For building predictive text we will again use the Course-Descriptions.txt file we have used in the previous exercises. The code for this is available in the file code_05_XX Predictive Text. We first load the data file into a raw text variable. Then we pre-process the dataset. First, we use nltk.word_tokenize method to convert descriptions into a list of tokens. We then remove special characters from the list. We remove punctuations from the token_list by using the compute library in nltk. Finally, we convert these tokens to lower case. We print a sample of this token_list and its count. Let's run the code now. From the sample list, we see that the words have been preprocessed as expected and we have a total of 579 tokens in this token_list.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.