Join Curt Frye for an in-depth discussion in this video Calculating dependent trials using the hypergeometric distribution, part of Excel 2007: Business Statistics.

If you've ever read anything about statistics, or math in general, you might have encountered problems that tell you you have a vase with 15 marbles, 10 of which are red and 5 of which are blue, and asks you to calculate the odds of selecting 4 marbles, and picking exactly 2 red and 2 blue. Although these situations are somewhat rare, business analysts do occasionally need to perform this type of calculation, which is described by the hypergeometric distribution. Now, let's suppose your company wants to give away cash prizes in a drawing, which is structured as follows.

You have 100 envelopes in a barrel. 20 of the envelopes contain $500, and the other 80 contain $20. A lucky customer can draw 10 envelopes, and keep whatever money they find. What you want to know are the odds of a customer drawing no $500 envelopes, one $500 envelope, and so on. To calculate the probability of drawing a specific number of target items--in this case $500 envelopes--from a collection, you need to know four things: The specific number of items drawn-- that means the items drawn of 10.

As I stated earlier, you're allowing the customer to draw 10 envelopes from the barrel. Second, you need to know the number of target items in the collection, and that is the number of $500 envelopes, and that is 20. Then you need to know the total number of envelopes that are available--that is 100. Finally, you need to know the number of $500 draws. In other words, you want to calculate the probability that the customer will draw zero $500 envelopes, one $500 envelope, two, and so on.

I'll show you some more things that you can do with those numbers toward the end of this movie. But for now, let's just find the probability of your lucky customer drawing zero $500 envelopes. I'll click in cell E2, and type "=hyp". The bottom function after I type "hyp" is hypergeometric distribution. I won't try to pronounce the abbreviated form. But when that is highlighted, and I press the Down Arrow key to highlight it, I'll press Tab, and we have the function.

Now, I can start adding in the arguments. So we have the number of successes. That's the first argument, and that value is in cell A2, in other words, the number of $500 envelops they draw. Then a comma. Next is the number of items that are drawn. That's in cell B2. The number of successes that are available in the population. In this case, that means the number of $500 envelopes. So, we can put that in C2. Then the number of items in the entire population, including both successes and what we're determining failures or $20 envelopes, and that is 100, and that is in cell D2.

Press the right parenthesis, press Return, and Excel fills the formula down to the rest of the cells in this table column. So we can see that the probability of drawing zero envelopes with $500 in them is about 9.5%, the probability of drawing one is 26, almost 27%, the probability of drawing two is about 32%, and so on. Now to do further calculation, we need to determine what each prize drawing would be worth. So, for example, if a customer draws zero $500 envelopes, how much money would they get? The way that we calculate that is we type equal, and then the number of $500 draws times 500.

So that would be (A2*500)+(10-a2)*20. So in other words, we are taking the number of $500 draws from cell A2, and multiplying it by 500, and we're adding 10 minus that number, and multiplying it by 20. Press Return, and we see that the value for each of the prizes goes up by $480.

When we replace a $20 envelope with a $500 envelope, the difference between those two values is 480. So these values make sense. We can see that if you draw one $500 envelope, you get 680, 1160 if you draw two, and so on, all the way up to--if you get incredibly lucky, run between the raindrops and draw ten $500 envelopes, you would win $5,000, which is the maximum price available. Now, we can calculate what's called the expected value. In other words, you multiply the probability of a particular outcome by the value or prize for that outcome.

After I calculate the expected value for each one of the possibilities, then I can add all those expected values together to get the total expected value for the drawing. So to do that, I will just multiply the probability by the prize. So that's =E2*F2. Press Return and we get our expected value for each of these possibilities. Now, note that the values are small enough for drawing nine and drawing ten that the expected value is displayed as 0.

But if I increase the number of decimal points for the entire column, you see that if you look far enough that there is an expected value, but it's about one ten-thousandth of a cent. Now what I can do is add a total row to the table. To do that, click any cell in the table. Then on the Design contextual tab, in the Table Style Options group, click the Total Row check box. We get the sum. We see that the expected value is $1,160.

The hypergeometric distribution doesn't come up very often in business, but if you want to structure a prize drawing of the type I described, you'll know how to figure out the odds and the expected value. Check your local laws, make sure this sort of thing is legal, and be sure to set aside enough money to pay the maximum prize in case someone gets really lucky.

###### Released

12/8/2010*Excel 2007: Business Statistics*with Curtis Frye shows managers and executives how to assess their organization's data effectively by applying statistical analysis techniques. The course covers important statistical terms and definitions, and then dives into techniques using the tools in Excel: formulas and functions for calculating averages and standard deviations, charts and graphs for summarizing data, and the Analysis ToolPak add-in for even greater insights into data. Exercise files are included with the course.

- Understanding statistical terms
- Creating a basic Excel table
- Auditing formulas
- Creating frequency distributions for qualitative data
- Calculating a running total
- Creating a histogram
- Using PivotTables
- Calculating mean, median, mode, and other numerical data
- Using probability distributions
- Population sampling
- Testing hypotheses
- Developing liner and multiple regression models

## Share this video

## Embed this video

Video: Calculating dependent trials using the hypergeometric distribution