From the course: Machine Learning and AI Foundations: Clustering and Association
How does k-means work?
From the course: Machine Learning and AI Foundations: Clustering and Association
How does k-means work?
- [Instructor] I'm looking at the same 3D scatterplot that we use to understand hierarchical cluster analysis. I've done so purposefully because k-means builds upon the hierarchical algorithm, but does it in such a way that it's faster. In fact, in the SPSS coding language, k-means is called quick cluster and I believe in the SAS programming language, it's called fast cluster. One of the motivations for k-means right from the very beginning was that it was more efficient. Hopefully you recall that in hierarchical, we have to measure every distance to every other distance and then we have to iterate through as many times as we have cases. Folks, that's n-cubed for the number of calculations. What k-means does is I have to tell it what the value of k is, so let's say I say it's three. k-means is going to find three well spaced points. Let me actually get rid of this one and choose that one. I think I've done a reasonable job choosing three points that are spread out from each other. In this three dimensional scatterplot, you may look all the way in the back corner and think that that one's more spread out, but the algorithm is going to do this, it's going to do all the math. Once it identifies the three well spaced points, now again, that's when k is equal to three, it's going to simply measure all the distances and there's going to be team one, team seven, team 33. Once that's done, it's going to calculate the centroid of those clusters and refine the solution a little bit. It doesn't make one pass of the data, but it only makes a handful of passes of the data. Obviously the nature of this algorithm is that it's doing fewer measurements. The same basic foundational concepts is hierarchical, but with a little bit of a computer science spin on it to make it a lot faster.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
How does k-means work?2m 3s
-
(Locked)
Which variables should be used with k-means?2m 46s
-
(Locked)
Interpreting a box plot6m 49s
-
(Locked)
Running a k-means cluster analysis3m 28s
-
(Locked)
Interpreting cluster analysis output5m 42s
-
(Locked)
What does silhouette mean?2m 20s
-
(Locked)
Which cases should be used with k-means?4m 44s
-
(Locked)
Finding optimum value for k: k = 35m 7s
-
(Locked)
Finding optimum value for k: k = 45m 51s
-
(Locked)
Finding optimum value for k: k = 55m 3s
-
(Locked)
What the best solution?3m 56s
-
-
-
-
-
-
-