Elsewhere in this course, you learned about covariance as one measure of how data sets tend to move in relation to one another. In this video, discover another statistic called correlation.
- [Instructor] When you collect business data, you will have the opportunity to examine how two sets of values relate to each other. In other words, do they tend to move in the same direction or opposite? Earlier in this course, I showed you how to calculate covariance. And the covariance calculation is multiplying the differences from the mean for each value pair and finding the sum. You would then divide that total by the number of data pairs. Correlation is a more complicated version of covariance and you can see that the top term is exactly the same but you're dividing by the term shown.
And rather than trying to give you an explanation of what that is, just know that you are finding the sum of the differences for the first data set and then squaring that value and then finding the sum of these square differences for all of the values in your second data set. So it's a much different calculation and if you're thinking ahead and wondering how we're going to implement that by hand in an Excel worksheet, don't worry about it, we're going to use formula functions only.
So how do you interpret correlation values? Well, data that it's completely uncorrelated returns a zero. If data is positively correlated, that means the two sets of values tend to move in the same direction at the same time, then you will have a correlation that is between zero and one. So if you have a one, you have perfect correlation, the closer it gets to zero, the weaker the correlation is. By contrast, data that is negatively correlated tends to go in opposite directions. When one value goes up, the other goes down.
There you will find a range of values from minus one which will be a perfectly negative correlation and as you move closer to zero, the effect of the two values moving in opposite directions becomes less. So let's take a look at data that is not correlated. Here I have a chart that has 10 values on it and you can see that I have values for one, two, three, four, and five. There are values for one at three and minus three, two also at minus three and minus three and so on. So what that means is that there's absolutely no correlation because one can't result in a three or a negative three, it doesn't appear to have any cause or relationship at all.
Next, we have data with a perfect positive correlation. And the usual example for this type of data is data that varies exactly the same way. So in this case, every time the X value goes up by one, the Y value goes up by one as well. And you might guess that a perfect negative correlation looks like the opposite. In other words, a downward slope and you would be correct. Here we have a data set where every time the X value goes up, the Y value goes down.
The next question is whether your correlation is significant. Remember that with covariance, it was more of an order of magnitude analysis and you were basically using your subjective judgment as to whether a covariance value was significant or not. With correlation, it depends on a couple of factors. The first is the number of measurements, the second is whether the value can be positive or negative. If a value can be either positive or negative but not both, then you have a one-tailed to test and you would only look at one side of the curve we'll be talking about later.
If the value can be either positive or negative, then you're performing what's called a two-tailed test. Then you look up your correlation value in a table and again, one-tailed versus two-tailed does make a difference. Let's take a look at a correlation lookup table for a two-tailed test. In other words, where you can vary either high or low. In the left-hand column marked N is the number of items in your sample and you see that it runs from five through 10 and then 15, 20, and 30.
The column labels after the N indicate the level of confidence. 0.1 refers to 90% confidence, 0.05 is 95% confidence, 0.02, 98, and so on. Your correlation values for your number of samples are number of data pairs and the confidence level you want can be looked up on this table. You don't need to create these tables for yourselves, you can find them online or in almost any statistics textbook.
So once you do your calculation, all you need to do is look up the value in a table to determine if your result is significant and if so, how significant it is.
- Distinguish between the mean, median, and mode.
- Describe the relationship between variance and standard deviation.
- Identify a nondirectional hypothesis.
- Point out the difference between COVARIANCE.P and COVARIANCE.S.
- Explain correlation.
- Analyze Bayes’ rule.