Review how variances within and between enable inferences about the populations in ANOVA and MANOVA.
- [Instructor] Let's take a look at the Analysis Of Variance, or ANOVA. You use the Analysis Of Variance to decide whether the means of two or more different groups are the same. For example, you might measure the weight of 20 men, and 20 women, and then calculate the average weight of each of those two groups. The average weights of the two groups are likely to be different. But what about the average weights of the populations of men and women, that you took your two samples from? That's the difference that you're probably interested in, not just the difference between the two samples.
Probability can help you decide whether the mean weights are different in the populations of men and women. And one of the ways of going about that, is to use the Analysis of Variance. We can do that in Excel, by way of the Data Analysis add-in, which is in the Analysis group, on the Data tab on the Ribbon. So, just click it, and choose ANOVA Single Factor, from the list box. And here's our input range. Notice that I'm including the names of the two groups, in the input data.
They're grouped by columns, rather than rows. In other words, all the men are in column A, and all the women are in column B. We do have labels in the first row. We'll have the output start in cell D1, and then we just click OK. That's all there is to it. We get this Analysis of Variance. We look at two measures of the variability in the weight. In this example, one measure is the amount of variability among individual people. How much the men vary from the mean of their group, and how much the women vary from the mean of their group.
This kind of variability is called Variability Within Groups. It's quantified by the means square within, shown in cell G12. By the way, means square is just another term for variance. The second sort of variability in weight, is the variability of the two group means. How much the mean weight of men, and the mean weight of women vary from the overall mean of both men and women. This kind of variability is called Variability Between Groups. It's quantified in cell G11, as the means square between.
If the variability between the group means is large, relative to the variability within the groups, we might conclude that the mean weights for men and women are different in the population, not just in the 20-person samples we took. We can make probability statements about that difference. For example, that it's less than one percent, and that is found here, that's less than one percent, that the population of males is, on average, the same weight as the population of females. That's the analysis of Variance from the 30,000 feet.
It's more complicated than that. We have to consider things, such as the number of people in each group, and whether the variability in one group, is equivalent to the variability in the other group. But, at root, we're analyzing variance to make an inference about group means. It's also possible to contrast groups using more than just one variable, such as weight. We might want to compare, say, product lines, on the basis of unit cost, unit price, units sold, months since product introduction, and so on. This sort of analysis is called Multivariate Analysis of Variance, or MANOVA.
In the next lesson, we'll see how understanding the concepts behind ANOVA and MANOVA apply the Cluster Analysis, which helps us create groups from scratch.
In this course, Conrad Carlberg explains how to carry out cluster analysis and principal components analysis using Microsoft Excel, which tends to show more clearly what's going on in the analysis. Then he explains how to carry out the same analysis using R, the open-source statistical computing software, which is faster and richer in analysis options than Excel. Plus, he walks through how to merge the results of cluster analysis and factor analysis to help you break down a few underlying factors according to individuals' membership in just a few clusters.
- Reviewing the problems created by an overabundance of data
- Understanding the rationale for clustering and principal components analysis
- Using Excel to extract principal components
- Using R to extract principal components
- Using R for cluster analysis
- Using Excel for cluster analysis
- Setting up confusion tables in Excel
- Using cluster analysis and factor analysis in concert