To evaluate how well a model fits data, compare the observed data with the model's predictions. Joe shows you how to test the divergence of the observed from the predicted, and how to use Excel's CHISQ.TEST function to assess the fit. You'll be able to apply this technique to test a wide variety of models.
- [Voiceover] I'll show you now how to test hypotheses about frequencies when you think that the frequencies are distributed throughout categories in a particular way. This is called goodness of fit. Here's an important real world example. The data are part of a data set from a Federal Aviation Agency study. It's a study of operational errors made by air traffic controllers between October 1996 and September 2003. The categories of experience level in years are in A2 through A7, the number of errors are in B2 through B7, the number of air traffic controllers in each experience level who were at risk of making an error during the observation period appears in C2 through C7, and the question is this.
Is the distribution of operational errors the same as the distribution of air traffic controllers? For example, if controllers with less than three years of experience are 7% of the population of controllers, do they make 7% of the errors? So the hypotheses we test are. Well, the null hypothesis is that the error distribution matches the air traffic controller distribution. The alternative hypothesis is not H0. The way we test is to compare the cell frequencies we observe with the cell frequencies we expect if the distribution of errors is the same as the distribution of controllers.
And then, we calculate this statistic. We take each observed frequency, subtract the corresponding expected frequency, and square the difference, and divide the expected frequency, and sum all of them together. This is distributed approximately as chi-square with degrees of freedom equal to the number of categories we have minus one. How do we calculate the expected cell frequencies? We start by calculating the proportion of controllers in each category. I'll total the controllers in C8, so to calculate the proportion of controllers in each category, in D2, type equals C2 divided by C8 F4 for the dollar signs because we're going to auto-fill.
Now we can calculate the expected errors in each category. I've totaled the errors in B8. In E2, we'll type equals D2 times B8 F4 for the dollar signs, and auto-fill. We can use Excel to calculate chi-square and compare with a chi-square distribution to get a p-value, or we can cut to the chase and compute the p-value directly by using Excel's chi-square test function. So, in E10, type equals chi-square test and then the pop-up menu tells us to supply the actual range, which is B2 through B7, and the expected range, which is in E2 through E7.
And that really, really small probability tells you to reject the null hypothesis. Note that chi-square test doesn't return a value for chi-square. So, you've learned the distribution of controllers across experience level does not fit the distribution of operational errors. And, more generally, now you've learned how to test goodness of fit, a very important skill when you deal with frequency data.
He explains how to organize and present data and how to draw conclusions using Excel's functions, charts, and 3D maps and the Solver and Analysis ToolPak add-ons. Learn to calculate mean, variance, standard deviation, and correlation; visualize sampling distributions; and test differences with analysis of variance (ANOVA). Then find out how to use linear, multiple, and nonlinear regression testing to analyze relationships between variables and make predictions. Joseph also shows how to perform advanced correlations, variable frequency testing, and simulations.
By the end of this course, you should have the foundational knowledge you need to take other statistics-related courses and perform basic analysis in the workplace.
- Identify the type of function that returns more than one value.
- Identify the type of function that returns values for the standard normal distribution.
- Name the different amounts of a factor in the analysis of varianc e.
- Explain how to use the Tool Pak Regression tool for exponential regression.
- Define negative correlation.
- Recall the term used to describe a simulation that includes randomness.