Start free trial Sign in

From the course: Machine Learning and AI Foundations: Clustering and Association

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Which variables should be used with k-means?

Which variables should be used with k-means?

From the course: Machine Learning and AI Foundations: Clustering and Association

Start my 1-month free trial

Which variables should be used with k-means?

“

- [Instructor] We have to talk about a very important topic. It's a question that I often get. What variables should I be using when I perform my cluster analysis. So let's take a look at these initial variables. They're really at the heart of the matter. These are total spend variables that had been built from a lot of transactional data. So I have one row per customer. And then I have several variables that represent how much each customer spent in seven different product families. What if we go ahead and proceed right now and analyze these variables. What's going to happen? Well, the big screen TVs sold in the entertainment department cost a lot more money than the software or the video games. So if you go ahead and proceed right now what's going to happen is entertainment sales is going to dominate the solution. So just as we've seen in hierarchical, for instance where it automatically will transform for you, we have to somehow get those variables in a form where they all have the…

Contents