From the course: Text Analytics and Predictions with Python Essential Training

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Purpose

Purpose - Python Tutorial

From the course: Text Analytics and Predictions with Python Essential Training

Start my 1-month free trial

Purpose

- [Instructor] There may be times when you run into a really large dataset with different attributes and you need to find similarities. In this situation, you can use something called clustering which is a machine learning technique that helps group similar elements based on their attributes. Clustering is a great candidate to use unsupervised learning. In unsupervised learning, there is no training dataset with prior classification. Rather, the features of the elements are used to group similar elements into a single cluster organically. There are a number of techniques available like k-means clustering and k-nearest neighbors. With respect to text mining, how do you find features? The words in a document become the features. Documents with similar words get grouped together. Clustering algorithms use only numeric data so text data needs to be converted to numeric representations. Text frequency-inverse document frequency or TF-IDF is the most popular technique used for this purpose.…

Contents