From the course: Machine Learning and AI Foundations: Clustering and Association

How does k-means work?

From the course: Machine Learning and AI Foundations: Clustering and Association

Start my 1-month free trial

How does k-means work?

- [Instructor] I'm looking at the same 3D scatterplot that we use to understand hierarchical cluster analysis. I've done so purposefully because k-means builds upon the hierarchical algorithm, but does it in such a way that it's faster. In fact, in the SPSS coding language, k-means is called quick cluster and I believe in the SAS programming language, it's called fast cluster. One of the motivations for k-means right from the very beginning was that it was more efficient. Hopefully you recall that in hierarchical, we have to measure every distance to every other distance and then we have to iterate through as many times as we have cases. Folks, that's n-cubed for the number of calculations. What k-means does is I have to tell it what the value of k is, so let's say I say it's three. k-means is going to find three well spaced points. Let me actually get rid of this one and choose that one. I think I've done a reasonable job choosing three points that are spread out from each other. In this three dimensional scatterplot, you may look all the way in the back corner and think that that one's more spread out, but the algorithm is going to do this, it's going to do all the math. Once it identifies the three well spaced points, now again, that's when k is equal to three, it's going to simply measure all the distances and there's going to be team one, team seven, team 33. Once that's done, it's going to calculate the centroid of those clusters and refine the solution a little bit. It doesn't make one pass of the data, but it only makes a handful of passes of the data. Obviously the nature of this algorithm is that it's doing fewer measurements. The same basic foundational concepts is hierarchical, but with a little bit of a computer science spin on it to make it a lot faster.

Contents