From the course: Text Analytics and Predictions with Python Essential Training

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Preparing data for clustering

Preparing data for clustering - Python Tutorial

From the course: Text Analytics and Predictions with Python Essential Training

Start my 1-month free trial

Preparing data for clustering

- [Instructor] In this example we will use data about courses and their hashtags available in the Course-Hashtags.csv. Please review the CSV file. This contains a list of course titles and the hashtags used in their course descriptions. Let us assume that we did some prior preprocessing to extract these hashtags from the text. We will now use these hashtags to group courses into similar clusters. The code for this example is available in code_03_XX Clustering Text notebook. We first load the CSV file into a Pandas DataFrame. We print the first two rows in the DataFrame to check its contents. We separate the hashtags and the course headings into separate lists from the DataFrame. We then use TfidfVectorizer from this scikit-learn package for feature extraction. Using this vectorizer we convert the hashtag data into a Tfidf matrix. We will also print the feature names which are the list of all the hashtags listed in the source file. Let us run this code, and see the output. The sample…

Contents