Join Conrad Carlberg for an in-depth discussion in this video Getting started, part of Business Analytics: Multiple Comparisons in R and Excel.
- [Instructor] Before we could get started with the course I need to address a few items. First, the tests that I talk about here, the Tukey and the Scheffe tests, are followup tests. That means that they're meant to be used after an analysis of variance or ANOVA tells you that there's a significant difference somewhere in your data. You follow up with another test that tells you where to find the source of that significant difference. When you're studying just two group means it's easy to say yes, there's a significant difference between group A and group B. Suppose you're testing the effect of a medication on Republicans and on Democrats. If you analyze the data within ANOVA and it tells you that a significant difference exists that difference has only one possible source. The medication has a different effect on Republicans than on Democrats. But things become ambiguous when you have three or more groups. For example, add Independents to your samples of Republicans and Democrats. If you run an ANOVA to test for group mean differences, you may get a significant result. But all it tells you is hey, there's a significant difference here somewhere. As yet, you have no way of knowing whether the treatment has a different effect on Republicans than on Democrats or on Republicans than on Independents or on Democrats than on Independents. All that ANOVA tells you is that there's at least one significant difference somewhere in the data. That's where multiple comparison tests come in. By using tests like Scheffe and Tukey you can distinguish which groups are different from one another. You might think, and everyone does, that you can nail down the source of the significant difference by running a series of t-tests after looking at the ANOVA, one t-test for each pair of group means in your design. But that's not what the t-test was designed to be used for. Those t-tests are not independent of one another and the result is that the t-tests could easily be making errors at a 40% error rate rather than the 5% error rate that an ANOVA might indicate. Statisticians have developed various tests, generally termed multiple comparison procedures, to more accurately identify the sources of a significant result in an omnibus ANOVA. The tests that I describe here aren't subject to that sort of problem. And although they are widely recognized and credible, they have other problems that you should be aware of. But again, you should usually run the ANOVA first. Let me show you how to do that using Microsoft Excel on a sample data set. With Excel open, click the ribbon's data tab. Look for a group on that tab that's labeled analyze or analysis. Excel versions differ here. If you find that group, usually at the far right end of the data tab, look for a menu item named data analysis. If you find that, you're good to go. If you don't find that group, or that item in it, you need to do some installation. Choose options from the ribbon's file group. And then click add-ins on the Excel options nav bar. At the bottom of the add-ins window, choose Excel add-ins from the manage dropdown box and click go. Fill the analysis tool pack checkbox and click okay. You should now see a new group on the ribbon containing the data analysis item. Have your data laid out in this fashion with different groups in different columns and different records in different rows. The worksheet with the data should be active. Click data analysis and scroll to the top of the analysis tools list box. Click ANOVA colon single factor and then click okay. Set the controls on the resulting dialogue box as follows. Click in the input range edit box and drag through your data range. In this case that's the range B one to H 21. Be sure that the columns option is selected. Fill the labels in first row checkbox. Enter a decimal fraction such as 0.1 in the alpha edit box or simply accept the default value of 0.5. Choose to have the output written to a range on the active worksheet and click an empty cell for the address or on the worksheet in a new workbook, then click okay. Very quickly, you'll see these results. You can use several of them, such as the mean square within, the count per group, and the critical F value, when you're setting up a multiple comparison test such as the two I describe here, the Tukey and the Scheffe.