In this video, learn how to calculate one such measure or one comparison called covariance.
- [Instructor] When you analyze data, it's often important to see how two sets of data vary in relation to another. For example, if you have a store and you know that individuals drive to get there, you might be interested to see if the distance they've driven is in any way related to the amount of money they spend. In this movie, I'll give you an overview of how to calculate one such measure or one comparison, called covariance. The covariance formula looks a little complicated, but it consists of a number of fairly straightforward steps.
For each data point in two data sets, you find its deviation from the mean. For example, if the average distance driven is 25 miles for your set of customers, and a customer drove 20 miles, then they would have driven five below the mean. You multiply the deviations for each pair of data points together. So if the customer who drove 20 miles spent $100, and the average is 150, then you would multiply five by 150.
You would find the sum of all those values and then divide by the number of data pairs. This formula does assume that you have all possible values, so a more conservative calculation is to subtract one from the number of data pairs. You've seen this technique used also in calculating standard deviation, either based on a population or a sample. The result is in terms of the original data, and that would be, in this case, dollars per mile driven.
How do you interpret covariance values? Well, if you have a zero as a result of your calculation, then the data sets don't vary together at all. They are apparently unrelated. A positive value means that the data sets tend to move in the same direction. And of course, a larger positive value means a stronger relation. Negative indicates that the data sets tend to move in opposite directions. It might be the case, for example, that individuals who drive farther to get to your store might spend less.
The next question then, once you've calculated your covariance, is whether it is significant. That is a difficult question to answer. It's easy to see that covariance values close to zero indicate little relationship between two sets of values. And of course, large positive or negative values can be significant. So that means you need to look at covariance in relation to the means or averages of each data set. Now, covariance is certainly useful, but many analysts choose to also calculate correlations.
And I'll cover that elsewhere in the course.
- Distinguish between the mean, median, and mode.
- Describe the relationship between variance and standard deviation.
- Identify a nondirectional hypothesis.
- Point out the difference between COVARIANCE.P and COVARIANCE.S.
- Explain correlation.
- Analyze Bayes’ rule.