Use the principal components to derive clusters of retailers
- [Instructor] Here is the combination of factors and clusters we developed in the prior lesson. No doubt there are a lot of sophisticated things we could do with this data, but one of them, which might occur to most, is not a multivariate ANOVA. Bear in mind that these are factor scores. Principal components, really. And therefor, are uncorrelated with one another. Furthermore, the rotation method we selected was varimax, and therefor the rotation was orthogonal. So, a multivariate ANOVA gets us nowhere. In fact, since the clusters were established using methods that are not fully under our control, I'd be leery of any approach using statistical inference at this point.
I'm much more comfortable with a garden variety pivot table. Let's see what one has to tell us about our clusters and our principle components, or factors. Click in any cell in columns A through E, choose the Insert tab and click Pivot Table. In the dialogue box, accept the default address for the table or range as the location for the table. Choose existing worksheet and specify the cell where it should start. Click OK.
Click Factor 1, Factor 2, and Factor 3 in the pivot table fields pane, and, if necessary, drag them down into the sum values area. Adjust each one to show the average rather than the sum, and choose to show, say, three digits. Drag the cluster variable into the rows area.
You will now have all three factors broken down by the two clusters. Let's also get a count of each cluster. Establish a new pivot table. In the pivot table fields pane, drag cluster into the row area, and also into the sum values area. Choose Count as the summary statistic.
Because I've worked with this data set before, I know that factor one represents antibiotics, factor two represents statins, and factor three represents diabetes medications. The data underlying the principle components analysis was dollar revenues, and the higher the factor score, the more money was taken in. Notice that the largest mean factor score is factor two, cluster two. Cluster two also has only about five percent of the observations.
So, if I were after low-hanging fruit, I'd want to look carefully at marketing statins. The greatest amount of revenue was apparently due to the selling of statins to the smaller cluster. So, this analysis is telling us where to look for the highest sales efficiency. Relatively few purchasers are buying them, but the average sale is relatively large.
In this course, Conrad Carlberg explains how to carry out cluster analysis and principal components analysis using Microsoft Excel, which tends to show more clearly what's going on in the analysis. Then he explains how to carry out the same analysis using R, the open-source statistical computing software, which is faster and richer in analysis options than Excel. Plus, he walks through how to merge the results of cluster analysis and factor analysis to help you break down a few underlying factors according to individuals' membership in just a few clusters.
- Reviewing the problems created by an overabundance of data
- Understanding the rationale for clustering and principal components analysis
- Using Excel to extract principal components
- Using R to extract principal components
- Using R for cluster analysis
- Using Excel for cluster analysis
- Setting up confusion tables in Excel
- Using cluster analysis and factor analysis in concert