From the course: Spark for Machine Learning & AI

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Introduction to clustering

Introduction to clustering - Apache Spark Tutorial

From the course: Spark for Machine Learning & AI

Start my 1-month free trial

Introduction to clustering

- [Instructor] Often when working with new data sets, it helps to explore the data and look for macro-level structures such as broad clusters of data. Clustering algorithms group data into clusters that allow us to see how large data sets can break down into distinct subgroups. K-means is widely used and works well for finding clusters in small and mid-sized data sets. For large data sets, the Bisecting K-means algorithms can be faster. We'll look at both of these algorithms and how to use them in Spark MLlib.

Contents