
Easytofollow video tutorials help you learn software, creative, and business skills.Become a member
In the last few movies we've discussed a few different ways to look at the association between pairs of variables. We looked at the correlation coefficient, which is an excellent general purpose tool, and we looked at bivariate regression which works really well when your outcome variable is a scale variable. We also looked at cross tabulations for when you have two categorical variables. But another very common situation is when you want to compare the means of two or more groups, or one group at more than one point in time. Although it's possible to do this with correlations and regression, if you go to Group Membership as 01 indicator variables, it's often easier to use specialized procedures for comparing group means for a few reasons.
First, they generally give you the group means along with the inferential tests and maybe even charge for the mean. So you can get more done on a single command. Second, these procedures often provide explicit tests for the assumptions behind the tests, such as the groups having equal spread in their scores. Third, the test statistics that they give, often the ttest or an analysis of variance, depending on which procedure you use, are the most common statistics for group comparisons, and so they may be more familiar to more people. Now one of the recent additions to SPSS is the flexible means procedure.
What's nice about this is that previously you had to choose different tests if you're comparing two groups or if you are comparing the means of more than two groups. And we will in fact cover these procedures in the next few movies. The means procedure on the other hand can handle either situation, and let's see how it works. For this example, I'm going to be using the GSS dataset, General Social Survey that I've used before. And to compare means, I need to come up to Analyze, to Compare Means, then I choose the first one, Means.
And from here I need to choose the variable that I want to look at as a dependent or the outcome variable, the thing that I think group membership affects. In this particular case, I'm going to use Family Income. So I can click that and I can drag it up there. Then I need to look at the Independent list. Those are the variables that I think will be associated and produce changes or simply be associated with family income. In this particular one, I'm going to choose a cultural variable, I'm going to scroll down here, and I'm going to choose whether a person attended a dance performance in the last year.
I'll click and move that into the Independent list. Then I'm going to come up to Options and I have the possibility here of getting the huge amount of statistics, including some relatively esoteric things like the harmonic mean and the geometric mean. The mean, the number of case, and the standard deviation on the other hand are good default, though I'd like to have them in slightly different order. So what I'm going to do is I'm going to click to get these out, just doubleclicking, and then I'll bring them back in with a number of cases first and then the mean and then the standard deviation.
Also I'm going to come down to the bottom here where it says Statistics for First Layer and check the first box for Anova table and eta. Anova is short for Analysis of Variance and it will give me an inferential test about whether the means for the groups differ. And eta is similar to the correlation coefficient except it can be used when there is more than two groups. So I'm going to select that one and I'm going to press Continue and then I'll press OK again. And what I get is several tables that show up.
The first table is the Case Processing Summary and it lets me know that I had complete data for all 349 cases in the dataset, so that's good. The second table labeled Report gives me the actual statistics, the descriptive statistics for my two groups on family income. So for instance, we see that there were 273 people who had not attended a dance performance in the previous year and their average family income was about $29,000 with a standard deviation of almost $26,000.
On the other hand, there were 76 people who had attended the dance performance in the last year and their average income for the family was nearly $47,000, so that's much higher. And they had a standard deviation of about $36,000. So you can see there is a very substantial difference there in the means, although the standard deviations are also rather large. The next table that says ANOVA table or ANOVA table for Analysis of Variance is the inferential test to let us know whether these two means differ statistically significantly from each other.
The important number here is in the very last column under Sig. That's the probability level or the significance level of this particular result and it says .000. It's not literally 0. It simply is less than .001. And this tells us that there is a statistically significant difference between these two means. On the other hand, there's also the question of how big is the effect and that's what we get from the fourth table that says Measures of Association. It looks at the association and gives us a statistic called eta.
And that is a version of the correlation or analogous of the correlation that can be used when there's even more than two groups. Now our value here is .252. Eta, like the correlation coefficient, goes from 0 to 1. And here we see that it's not terribly high but it is above one and the Eta Squared is an indication of how much of the variance in the family income can be explained by group membership, by having knowing whether a person attended a dance performance in the last year or not.
And here we see it's .064. That can be read as a proportion as 6%. So what we see is that there is a statistically significant difference in the means between the two groups. It's not huge because the standard deviations are large, but it does let us know that there is an association, that people who saw dance performances generally had higher family incomes than people who had not attended dance performances for whatever reason that might be. So the means procedure is a handy way to compare the means of any number of groups on any number of variables.
Not only does it give the descriptive statistics and an inferential test, it also gives a measure of association. This makes the means procedure a flexible and easy way to get a lot of tests done quickly. In the next two movies, we'll look at the specialized procedures for comparing the means of two groups or two or more groups, each of which may provide some information and options that aren't available in the means procedure. So they may be more useful for you as you explore your own data.