How are these statistics the same/different from what was already covered? How are they calculated?
- [Voiceover] Discrete random variables. These are experimental results often characterized by whole numbers, they can't be decimals. For example, suppose we monitored the number of drinks ordered per customer at a particular Starbucks location between 8am and 9am tomorrow. The number of drinks ordered per customer would be discrete, you couldn't order half of a drink or 30% of a drink. Here's the data for the number of drinks ordered per customer at a particular Starbucks between 8am and 9am.
This table is called the probability distribution. We can see that we had 50 different paying customers. Note, we're only counting the number of drinks in their order. 35 customers ordered only one drink. Eight customers placed an order for two drinks in their purchases. We even had one customer that ordered zero drinks during their purchase, perhaps they bought a pastry or some fruit. As you can see, the customers ordered between zero drinks and five drinks during their purchases.
Now that we know how many of the 50 customers ordered each of those discrete drink totals, we can find the relative frequencies. 0.02 or 2% of customers ordered zero drinks. 0.70 or 70% ordered a single drink and so on. So, what was the mean, the average number of drinks ordered per customer during this single hour? For this we will be calculating a weighted mean, in other words, I will multiply zero drinks times the relative frequency 0.02.
I will multiply one drink by 0.70. I will multiply two drinks by the relative frequency of 0.16. I will then add up all of these products and this tells me that my mean for this probability distribution is 1.46 drinks. Our average customer between 8am and 9am ordered 1.46 drinks. Next, let's figure out the standard deviation for this probability distribution.
We have the same data set here, but I've sandwiched in a new column between drinks ordered and frequency. This column is simply the number of drinks ordered squared. Just like with the mean, my next step is to multiply. This time I multiply drinks squared by the relative frequency. So for five drinks, 25 times the relative frequency 0.02. I do this for each discrete drink order zero to five.
I then add up the products. Here I get a total of 3.02. Remember this number, 3.02, use drinks squared. So, to get the variance, we subtract 3.02 minus our means squared. Our mean was 1.46. 1.46 squared is 2.13. So, our variance is 0.89. If we take the square root of our variance, 0.89, we get our standard deviation, which is often referred to as sigma.
The standard deviation is therefore 0.94. This is a nice opportunity to put together a few of the things we've learned throughout this course. The average customer in Starbucks between 8am and 9am ordered 1.46 drinks in their purchase. The standard deviation for this probability distribution is 0.94 drinks. So, one standard deviation from the mean would be a range from 1.46 drinks minus .94 to 1.46 drinks plus .94, a range of 0.52 drinks to 2.4 drinks.
Since drink orders are discrete, though, this range really just accounts for drink orders of one or two drinks. According to our table, this accounts for 86% of our customers. Two standard deviations from the mean ranges from negative .43 drinks to 3.35 drinks which accounts for drink orders between zero and three drinks. This accounts for 94% of our customers. And three standard deviations from the mean ranges from negative 1.37 drinks to 4.29 drinks which accounts for drink orders between zero and four drinks.
This would include 98% of our customers. And you might remember that three standard deviations should capture 99.7% of our data points. So, according to our calculations, that single order of five drinks was a true outlier. And just like that, you're becoming a real statistician.
Released
9/18/2016Professor Eddie Davila covers statistics basics, like calculating averages, medians, modes, and standard deviations. He shows how to use probability and distribution curves to inform decisions, and how to detect false positives and misleading data. Each concept is covered in simple language, with detailed examples that show how statistics are used in real-world scenarios from the worlds of business, sports, education, entertainment, and more. These techniques will help you understand your data, prove theories, and save time, money, and other valuable resources—all by understanding the numbers.
- Calculate mean and median for specific data sets.
- Explain how the mode is used to assess a data set.
- Identify situations in which standard deviation can be used to investigate individual data points.
- Use mean and standard deviation to find the Z-score for a data point.
- List the three different categories of probability.
- Analyze data to determine if two events are dependent or independent.
- Predict possible outcomes for a situation using basic permutation calculations.
- Give examples of binomial random variables.
Share this video
Embed this video
Video: Mean and standard deviation of discrete probability distributions