Join Curt Frye for an in-depth discussion in this video Analyzing data in your Excel table, part of Learning Excel Cluster Analysis.
Cluster analysis helps you identify potentially related items within a dataset. For example, you might be able to identify customers that have something in common or perhaps products that people tend to buy together. In this movie, I will show you how to do some analysis based on the raw data, so that you can use your domain knowledge to get more insights into the clusters you create. As my sample file, I'm using the analysis workbook. And you can find that in the chapter three folder of your exercise files archive.
This workbook contains a worksheet with a table that contains different data than what we've worked with elsewhere. That other data was meant to help you create your cluster analysis workbook and be able to understand what was going on, but in this case we're looking at a little bit more realistic set of customer data. This data set has a number of customer records, indicating the customer's city, state, age, the date of their first order, the amount of the average order, and the cluster to which they belong.
This data set has three clusters and those are indicated here. 1, 2 and 3. Because I've stored my data in an Excel table, I can display the table's total row. To do that, I make sure that any cell inside of the table is selected. Then I go to the Design tab. And in the Table Style options group ,check the Total Row box. Doing so adds a total row to the bottom of the table, and I can now click inside of any of the cells, for example, underneath the age column.
And I can click the down arrow that appears, to select a summary operation. So let's say that I want to calculate the average age of my customers, at least the ones that appear in this collection of data. For that, I would select the Average operation, and I see the average is a little bit more than 39 years. If I want to find the first order date, then I can click the first order date, and then click the down arrow there, and click Min.
So that finds the minimum value, and that's January 19, 2011. I can also find the maximum value. So for example, I can look for the customers with the highest maximum order. So for that, I would click the average order cell in the total row. Click Max, and the value is 940.75. And it just so happens that that customer is from McCall Idaho, and it's easy to point out because the record appears just above the total row. So those are the calculations for all of the data in the table.
Now instead of summarizing all the data in the table, let's filter the data so we're looking at one cluster at a time. To do that, click the filter arrow to the right of the Cluster field. And then clear Select All. And now check the box next to any value, or cluster that you want to display. Let's say I want to display the values for cluster three only. So I will check the three box and click OK. And when I do, and I'll scroll up with my mouse wheel, I can see that the table only displays rows for cluster number three.
And if I look at that data, I can see that the states of those orders appear to be mostly on the west coast. McCall, Idaho is almost west coast. It looks like my customers, a fairly good range, but mostly in 2011 and 2012, actually mostly 2012, because that's almost 2012 there, December 30th, and the average orders look pretty good, except for this customer in McCall, Idaho. So it's possible that customer has moved to McCall after doing well in business, at age 40, that seems like a likely story.
Now let's switch the analysis to cluster number one. So I'll click the cluster column's filter arrow. Clear the check next to 3, check the box next to 1, and click OK. These orders appear to be from the east coast, and mostly the New York City region. Princeton, New Jersey, Cornwall-on-Hudson, New York itself, Manhattan. And New Haven, Connecticut which isn't that far away. I can see that the average age is a bit higher. But that I've had customers since 2011, since early in 2011 and it appears that I have more of them from early in my business.
And take a look at cluster number 2. Clear the box next to 1, click 2, click OK, and I can see that I have a few, fewer customers in this group. The age is sort of in between my other two clusters. And I have one older customer. And for very new customers. Ones that came in very recently, in the last half of what is now the previous year. And also the average order for these customers tends to be a little bit lower. After you have used the cluster analysis identification techniques that I've shown you elsewhere in this course, you can look at the data that you're analyzing, using cluster analysis, to get more insights once those clusters have been identified.
- What is clustering?
- Setting up your Excel table
- Creating a list of centroids
- Identifying the closest centroid
- Updating centroids using averages
- Automating updates with macros
- Analyzing your data in Excel
- Visualizing data in a chart