From the course: Text Analytics and Predictions with Python Essential Training

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Preparing data for classification

Preparing data for classification - Python Tutorial

From the course: Text Analytics and Predictions with Python Essential Training

Start my 1-month free trial

Preparing data for classification

- [Instructor] For this exercise, we used two source text files. First, we have the Course-Descriptions.txt file which contains a list of course descriptions for various technology courses. This is our feature variable set. In order to train the model, we need to also tag these descriptions with specific classes. For this we use under the document called Course-Classification.txt. For this we have another document Course-Classification.txt that list the classes for each of the course descriptions seen in the earlier file. The line numbers between the descriptions and the classes match one on one. The classes used are Data-Science, Programming and Cloud-Computing. In order to prepare the data for classification, we need to build a TFIDF matrix. In order to prepare data for classification, we need to build a TFIDF matrix. We first load the Course-Description.txt into a list of lines. Then we use the stopwords list in the nltk library for stopword removal. We will also use the word net…

Contents