The data distribution that you are most likely to use during your analysis is the normal distribution to describe the different ways that you can work with a normal curve in Excel.
- [Instructor] The data distribution that you are most likely to use during your analysis is the normal distribution. The normal curve, or bell curve, has a distinctive shape you have probably seen before. In this movie, I will show you how to graph a normal curve and how to start interpreting some of its values. My sample file is the Normal workbook, and you can find it in the Chapter Four folder of your exercise files collection. The normal distribution is defined by two values.
The first is the mean, or the average, and the second is the standard deviation. The standard deviation is the amount of spread in your data or it's one measure of the amount of spread in your data. It's also very useful, statistically, as we will see elsewhere in this course. To display the normal curve inside of Excel, I want to create calculations for a variety of values. And those will appear in the chart that you see on the worksheet. I'll start in cell B4, where I will calculate the probability of the value 36 occurring.
So, in cell B4, I will type in equal sign, and then the function I will use is NORM.DIST. This takes a look at the probability of a specific value occurring within the normal distribution. This function requires four arguments. The first is the value, and that's in cell A4, which is, in this case, 36, but I'll use a cell reference, and then a comma. The next is the mean, and that's in cell B1, so I'll type that. I want to make the cell reference absolute, so I will press F4.
The dollar signs indicate it's an absolute reference and won't change, then a comma. The standard deviation is just below it, in B2. And again, I want to make it an absolute reference, and then a comma. And finally, I need to indicate whether I want the cumulative distribution function, that is, any value up to, and including a specific value, my X. Or, whether I want to look at the point probability using the probability mass function. That is, the probability that a specific value will occur.
In this case, I want to use the probability mass function, so I'll highlight false, press tab. Everything looks good, right parenthesis and enter, and I get my first value. And it doesn't even show on the chart, it is so low. However now, I can copy this formula down for all of the cells in Column A that contain values. So I'll click cell B4, double click the fill handle. I know that my mouse pointer's in the right place when it changes to a black crosshair. Double click, and there you see the familiar shape of the normal curve.
The average, or the peak occurs right at 100, and you can see that the values spread out and that is a function of the standard deviation of 15. In most statistical texts, you will see the mean indicated using a lowercase Greek letter Mu. And the standard deviation indicated using a lowercase Greek Sigma. If you want to calculate the specific value, of say, 84 occurring, then you can click in cell E1.
Type in equal sign, and again, we use NORM.DIST. The X is 84, I need to type it in, 'cause I don't have a cell reference. The mean is in B1, then a comma. Standard deviation is in B2, then a comma. And again, I'm going for the probability of 84 occurring, exactly. So I'll use FALSE, right parenthesis, and enter. And I get a value of about 1.5%. To go for the probably of 84 or less, I need to use the cumulative probability.
So in cell E2, I'll type in equal sign and I will have NORM.DIST, and the X again, is 84, comma. Mean's in B1, comma, standard deviation is in B2, comma. And this time, I will have TRUE for the cumulative probability. Type right parenthesis and press Tab so I don't highlight a cell below the chart and I see that getting a value of 84 or less will happen about 14.3% of the time. Now let's say that I want to find the cutoff, where 71% of values will occur below a specific point.
In this case, let's do 71% and 85%. So in cell H1, I'll type in equal sign. And here, instead of using NORM.DIST, I need to use the inverse, which is NORM.INV, followed by a left parenthesis. Now, I need to indicate the probability. We'll make that .71, then a comma. The mean is in B1, comma, the standard deviation is in B2. Right parenthesis, enter, and we see that 71% of values would be below 108.3.
And if we take a look at where the mean is at 100, and the standard deviation of 15, then having 71% of values occur below 108.3 makes sense. We can do the same thing for 85%. So, equal, NORM.INV, left parenthesis. Probability will .85, comma, the mean is in B1, comma. Standard deviation's in B2, right parenthesis and Tab. And we see that 85% of the values will occur below 115.5.
Note that 14% of values will occur between 115.5 and 108.3. And if you look at the chart again, you will see that the normal curve is higher, has higher probability toward the middle, around 100, and that even at 115, you're still seeing a lot of values. So if we wanted to see the value, where we had 95 or even 98% of values, we'd be much further over to the right.
- Distinguish between the mean, median, and mode.
- Describe the relationship between variance and standard deviation.
- Identify a nondirectional hypothesis.
- Point out the difference between COVARIANCE.P and COVARIANCE.S.
- Explain correlation.
- Analyze Bayes’ rule.