Joe Schmuller applies the analysis of varience on to test hypothesis on regression. Joe helps you to answer if the regression line is a significant upgrade over the mean as a prediction tool. You will examine three kinds of varience that reside in a scatterplot: Residual variance, regression variance, and total variance.
- [Narrator] Let's apply analysis of variance to test hypotheses about regression. We'll test whether or not a regression line is a significant upgrade over the mean as a prediction tool. Let's begin by examining the three kinds of variance in a scatterplot. We have residual variance, regression variance, and total variance. Residual variance, as I mentioned earlier, is based on differences between scatterplot points and predicted points on the regression line. This variance represents what the regression line cannot predict.
It's equal to the sum of squared deviations of data points around predicted points, divided by N minus two. N is the number of data points in the scatterplot. Regression variance is based on differences between predicted data points and the mean of Y. The mean square for regression is equal to the sum of squared deviations of predicted points around the mean, divided by one. This variance reflects the gain in predictability by using the regression line rather than the mean to make predictions about Y.
Total variance is based on deviations of data points from the mean of Y. It's equal to the sum of squared deviations of data points from the mean of Y, divided by N minus one. This is just the variance of the Y-scores. So if the regression line adds no predictability, then the mean square for regression is equal to the mean square residual. So we can test whether or not using the regression line to make predictions is a significant upgrade over using the mean. The null hypothesis is that sigma-square regression divided by sigma-square residual is less than or equal to one.
And this is the hypothesis that the regression line adds no gain in ability to predict Y. The alternative hypothesis is that this ratio is greater than one and alpha is 0.05. So the process is to gather data and calculate mean square regression and mean square residual, use an F-ratio to compare mean square regression to mean square residual, and if F is greater than a critical value, reject the null hypothesis. In this data set that I used earlier, the mean square for regression is 702, the mean square for residual is 17.21, and F with one and 18 degrees of freedom is 40.79.
With alpha equals 0.05, the critical value is 4.41. So we'd reject the null hypothesis and conclude that the regression line adds to the ability to predict over and above using the mean. Excel has a data analysis tool that does the ANOVA. This spreadsheet holds the data in the example you just saw. The employee names are in column D, aptitude scores in column E, and performance scores in column F. On the data tab in the analysis area, click the Data Analysis button.
In a dialog box, select Regression and click OK to open up this tools dialog box. With the input Y range active, select the data in column F, starting with the Performance label in cell F1. And with the input X range box active, select the aptitude score data in column E, starting with the Aptitude Score header in cell E1. Check the box next to labels and click OK.
On the new page, let's widen the columns, Home tab. And in the cells area, click Format and select Autofit Column Width. Excel provides a lot of information, but we'll just focus on the ANOVA table. Column A presents the source of variation, regression, residual, and total. The degrees of freedom, sums of squares, mean square, and F are in columns B through E. As you can see, the F-ratio is very high and the significance F in cell F12 shows a very small probability of an F this extreme or more if the null hypothesis is true.
So we can reject the null hypothesis and conclude that the regression line significantly adds to our ability to predict over and above the mean.
He explains how to organize and present data and how to draw conclusions from data using Excel's functions, calculations, and charts, as well as the free and powerful Excel Analysis ToolPak. The objective is for the learner to fully understand and apply statistical concepts—not to just blindly use a specific statistical test for a particular type of data set. Joseph uses Excel as a teaching tool to illustrate the concepts and increase understanding, but all you need is a basic understanding of algebra to follow along.
- Understanding data types and variables
- Calculating probability
- Understanding mean, median, and mode
- Calculating variability
- Organizing and graphing distributions
- Sampling distributions
- Making estimations
- Testing hypothesis: mean testing, z- and t-testing, and more
- Analyzing variance
- Performing repeated measure testing
- Understanding correlation and regression