When you collect a series of measurements, such as the amount of time it takes to perform a task or daily sales at a store, you can analyze the data to see how much it varies from the average. Two useful measurements are variance and standard deviation.
- [Instructor] When you collect a series of measurements, such as the amount of time it takes to perform a task, or daily sales at a store, you can analyze the data to see how much it varies from the average. Two useful measurements are variance and standard deviation. I'll show you how to calculate both of those summary statistics in this movie. And, I'll note that the variance functions are VAR, V A R, and VARP. That's all well and good, but what does variance actually measure? In technical terms, it is the sum of the squared error for each value in a data set, divided by either the number of values, which is the entire population, that uses VARP, or the number of values in the measurement minus one, and that's VAR, V A R.
If you want to see what the formulas look like when they're presented formally, here they are. So the idea is that you are determining how far each value is from the average, that's the x - x bar, and then squaring that value, finding the sum of that same calculation for all values in your data set, and then dividing either by n, which is again for the population, or n-1 which is for a sample. You use population when you are absolutely certain you have every possible value, and you use sample every other time.
Of course, in practice, most data analysts use VAR all the time, because it returns a larger result, and is therefore a more conservative estimate. A related calculation is standard deviation. The standard deviation, which is calculated using STDEV and STDEVP, is the square root of the variance, and here's what those formulas look like. You can see that on the left we have the square root of the population variance, and, on the right, we have the square root of the sample variance.
And, again, in practice, data analysts always use just the regular standard deviation for a sample as opposed for the population. Standard deviation is a smaller number that's more in line with the actual values in your data set. Remember that you are squaring the error, or the distance from the average, for each value when you're calculating the variance, and when you calculate the standard deviation, you take the square root of that, so you're removing some of the distortion in the data. And, as always, with variance, you use STDEV instead of STDEVP, which assumes that you have every possible value.
The next logical question is what these good for? What do you use variance and standard deviation for? And the answer is that standard deviation gives you some very interesting information. It tells you a lot more than the variance does. When you calculate standard deviation, given normal assumptions about data that are usually true, about 68% of the values will be within one standard deviation of the mean. Also, about 95% of the values will be within two standard deviations, and about 99.7% of values, that is 997 out of 1,000, will be within three standard deviations.
So let's take an example. Suppose you have a data set that has a mean of 150, and a standard deviation of 30. If that's the case then 68% of the values will be in the range from 120 to 180, that's plus or minus one standard deviation. 95% of the values will be from 90 to 210, plus or minus two, and 99.7% of values will be from 60 to 240, and that's plus or minus three standard deviations.
And that is very useful information. If you routinely see values that are more than two or three standard deviations from the mean, then you should take a look to see why those values are occurring. It might be due to sales of very expensive items, or, on the other hand, if you see days where sales were high but profits were low, then there might be something going on that you need to investigate. Now let's switch back over to Tableau, and take a look at how you calculate variance and standard deviation in that program.
I'm back in Tableau, and I'm using the variance workbook. You can find that in the chapter two folder of the exercise files collection. My visualization is just a text table, and I have the average order for my five categories of furnishings, housekeeping, maintenance, office supplies, and public areas. Let's say that I want to calculate the variance. I can do that one of two ways. I can either change the summary function here for my average total cost.
So I'll click that, point to Measure, and then I'll click Variance, and I'm never going to use population, I'll just use regular variance. And I see that those are very large numbers indeed. And if I click again, and click, or point at, Measure and then click Standard Deviation, then I see the standard deviation is 168.8, 32.8, and so on. If I want to add the average total cost back, so I can compare those values, then I can drag that to text.
So I got my standard deviation, and my total. I'll change the total from Sum to Average. And I see that, for furnishings, my standard deviation is actually larger than my average, which is interesting. That means that I have a bunch of smaller orders, but I also have some larger orders balancing them out, and in this case, much larger. If I want to calculate the standard deviation or variance within a calculated field, I can go to Analysis, Create Calculated Field, and then I'll type in VarTotalCost then press Tab, and I'll use the VAR function, and then I'll use TotalCost, which I'll enter that way, looks good click OK.
And there's my VarTotalCost field here, and I can do the same thing for standard deviation. So Analysis, Create Calculated Field, and I'll call it StdevTotalCost, press Tab, and I'll use STDEV then TotalCost, looks good, and click OK. Now I can drag my previous calculation here to the Measures area and I can add the two that I created here.
So I have StdevTotalCost, add that to the text area, and then VarTotalCost, which I will also add the text. And I can see those values there. So, I have my average total cost, the standard deviation, and the variance, all put together in my visualization.
- Defining data types and conversions
- Defining function syntax in Tableau 10
- Creating a calculated field
- Creating an ad hoc calculation
- Summarizing data using aggregate functions
- Calculating values using number functions
- Finding the absolute value or sign of a number
- Working with times and dates
- Creating IF THEN expressions
- Managing null values using IFNULL and ISNULL
- Creating level of detail expressions
- Creating a quick table calculation
- Converting and measuring strings