Understand how values express unique variance and guide the number of components to retain.
- [Instructor] After the factors have initially been extracted, it's often useful to perform what's called a factor rotation. You can consider the factors as though they were axes in a chart. With three factors, you would have three axes: vertical, horizontal, and depth. Because the factors are orthogonal, those axes are at right angles to one another. Suppose, as here, that we have two factors or principal components. Ideally, we would like to have one subset of the variables that are strongly related to one of the principal components, then another subset that are strongly related to the other principal component.
This ideal situation seldom comes up immediately after the factors have been extracted, but we can come close to that ideal setup by rotating the axes, keeping the data points where they are, but turning the axes so that each variable is close to the center of the chart on one or more axes and far from the center of the chart on one axes only. When you have rotated the axes to that alignment, you find individual variables that are strongly related to a component far from the center of the chart and you find variables that are weakly related to the component close to the point of origin of its axes.
Let's see how this looks in the principal component's output. This information is on the rotated loadings worksheet in the workbook that contains the input data. There are only three factors shown in columns B, C, and D. Recall that when I filled out the dialogue box for principal components, I wanted to retain three factors. The original variables are shown in column A. The numbers in cells B2 through D22 are called factor loadings. You can interpret them as though they were correlation coefficients ranging from a minimum of -1.0, showing a perfect negative relationship, through 0.0 showing no relationship, to 1.0, showing a perfect positive relationship.
I have highlighted in yellow the factor loadings that exceed 0.5. Those are stronger loadings, so you can see that products C, D, J, and U all load fairly strongly on the first factor. Similarly, product H, product I, and product L load fairly strongly on the second factor. Four additional products, O, P, R, and S, load moderately on the third factor. Notice that there are several products starting with products A, B, and E, F, and G, and so on that do not load strongly on any of the three factors.
Those products tend to exhibit unique variance, variance that they do not share with other products. You may well find that the variables that load strongly on the same factor may share something in common. For example, when the components are rotated, products C, D, J, and U all load strongly on the first factor. All four products might be, for example, antibiotics. Products H, I, and L, which load strongly on the second factor, might all be statins. At this point, if you feel comfortable with the principal components analysis so far, you can start analyzing the individual records using factor scores instead of variable scores.
That is, you can act as though the factors were observed variables and analyze different groups or clusters, from a cluster analysis, according to their factor scores. You can get those factor scores from the principal components worksheet. Just scroll down until you reach the portion of that worksheet labeled Factor Scores. You'll find each individual's factor score listed there. You can also use the coefficients for the rotated factors found on the worksheet named Rotated Factor Coefficients.
That's the reason that after some initial exploration by a principal components analysis, you might want to run the analysis again using the record IDs in the dialogue box. That way, you can tell which observation has which factor score for some analyses.
In this course, Conrad Carlberg explains how to carry out cluster analysis and principal components analysis using Microsoft Excel, which tends to show more clearly what's going on in the analysis. Then he explains how to carry out the same analysis using R, the open-source statistical computing software, which is faster and richer in analysis options than Excel. Plus, he walks through how to merge the results of cluster analysis and factor analysis to help you break down a few underlying factors according to individuals' membership in just a few clusters.
- Reviewing the problems created by an overabundance of data
- Understanding the rationale for clustering and principal components analysis
- Using Excel to extract principal components
- Using R to extract principal components
- Using R for cluster analysis
- Using Excel for cluster analysis
- Setting up confusion tables in Excel
- Using cluster analysis and factor analysis in concert