In this video, learn how to analyze your data to discover two important measures: the variance and the standard deviation.
- [Instructor] In this movie, I will show you how to analyze your data to discover two important measures. The variance and the standard deviation. Variance is one measure of error that is the distance of individual values from the mean or average. Standard deviation is the square root of the variance and, as we'll see elsewhere in the course, is extremely useful. My sample file is the variance workbook, and you can find it in the chapter one folder of the exercise files collection. I have a dataset consisting of 10 order values.
Those run from cells A4 through A13. I'll show you how to calculate variance and standard deviation for both the entire population of data or sample, and then show you how to do it quickly with formulas. I'll show you the long way so that you understand what's happening when Excel spits out a value, so you know what's happening behind the scenes. We start by calculating the error for each of the values in the dataset, and the error is the distance from the average or mean.
You can see that the formula in cell C1 calculates the mean for the dataset, A4 through A13. So, I'll click in cell B4, and then type in equal sign, and I want to subtract the value in A4 from the value in C1. So, I'll type A4, and I'll leave this cell reference relative, so it'll change when I copy the formula later. And then I'll type minus, and that would be minus C1.
I want the reference to cell C1 to stay in place. I don't want it to change, so I need to make it an absolutely reference. To do that I will press F4. And you can see that the dollar signs mean that the column and row will not change. And I'll press tab to move to cell C4. And I have my value of minus 3,089.6. Next I want to calculate the squared error. And you use the squared error so that you have a positive value for your variance.
In other words, if you had a value that was 1,000 below the average and another that was 1,000 above the average, then those two differences would cancel out, because you have plus 1,000 and minus 1,000. So, instead, you want to square the error to capture the fact that those are, in fact, distant from the mean. So, in cell C4, I'll type in equal sign, and the formula is squaring the value in B4. So, I'll type B4 and then a carrot followed by the number two, and enter, and I get the square of the error.
Now I want to copy these formulas down to the rest of the values. So, select cells B4 and C4, and then I'll click the fill handle at the bottom right corner of the selected range. And I will double click the fill handle which looks like a small green square at the bottom right corner of the selected cell range. And my mouse pointer's in place 'cause it turned into a black cross, double click, and I get my values. So, I have the error, the distance of the mean for each of the values, and then squared, so I always have a positive value.
The variance for a population is the average of all the squared errors. So, I'll click in cell F3, type equal, and then just type average, and then the values are C4 through C13, right parenthesis and enter, and I get the value you see there, about 6.4 million. The standard deviation for the population is simply the square root of the variance. So, in cell F5, type in equal sign and SQRT for square root, and the cell that we're taking the root of is F3, right parenthesis, enter, and we get 2,529.02.
So, that's what you do for a population. You should only use population for standard deviation when you are certain that you have captured every value from a dataset. In other words, every possible person in an office, or you have surveyed ever possible person in the United States, the state of Oregon, or whatever. Because you can never really be certain you have interviewed everyone or captured every possible value, most analysts use a slightly more conservative method where you subtract one from the number of observations instead of just taking the average.
So, the variance for a sample in cell F7 would be equal, and then you find the sum of the squared errors, and those are C4 through C13, so that would be sum, left parenthesis, C4:C13, right parenthesis, and that is divided by the count minus one. So, left parenthesis and then count, C4:C13, close that parenthesis then subtract one, right parenthesis and enter, and you get a slightly larger variance.
And again, that difference is because you subtracted one. For the standard deviation of the sample, you take the square root of the variance, so equal, square root, and the value is in F7, right parenthesis and enter, and you get a slightly higher standard deviation. Now, after all that work, let's see how to do it quickly using formulas. I3, and type the following formula. Equal, VAR.P which is variance for population, and then A4:A13, right parenthesis enter, and we get the same value as before.
Standard deviation.p is equal, STDEV.P, A4:A13, right parenthesis, we have the same value. And then for the sample, it's equal, VR.S, left parenthesis, A4:A13, right parenthesis, same value, and I'm sure you can see where this is going. In cell I9 I'll type equal, STDEV.S, left parenthesis A4:A13, right parenthesis enter, and we get the same values.
I know this was a lot of work when I could've just showed you how to type in a couple of functions and a cell range. But the more you know about how variance and standard deviation are calculated, the better you can appreciate the results of your analysis.
- Calculating mean and median values
- Analyzing data using variance and standard deviation
- Working with sample data
- Minimizing errors
- Visualizing data with histograms, charts, and more
- Testing hypotheses
- Measuring covariance and correlation
- Performing Bayesian analysis