Just how far is a data point from the mean? Standard deviations help us figure this out.
- Suppose I tell you that one data set has a standard deviation of five, and another had a standard deviation of 50. Which of those is a better data set? The answer is that it really depends. First consider what you're actually measuring. Is 50 the standard deviation of the daily temperature in Celsius for Paris in the month of January? If so, that would be huge. Or is 50 the standard deviation of the weight in kilograms for total daily cheese consumed in Paris? In that case, 50 would likely be very small.
In those examples, we were considering the data set as a whole. But standard deviation is also used to investigate individual data points. How do we do that? Well often, you'll hear someone refer to a number of standard deviations from the mean. They might say that something is within two standard deviations from the mean, three standards from the mean, or perhaps 1.5 standard deviations from the mean. Let's look at a simple example so we can discover what this might mean.
Let's use this data set: the weights of 10 men that visited a doctor's office today. The mean weight is 189 pounds. The standard deviation is about 90 pounds. One standard deviation from the mean would be 90 pounds lower than 189 all the way up to 90 pounds heavier than 189 pounds. Roughly from 99 pounds to 279 pounds. When we look at our individual data points, we can see that the first nine data points are within one standard deviation from the mean.
How about that last data point? Let's try 1.5 standard deviations. So again, about 135 pounds in either direction from the mean. 54 to 324. Nope, 425 is not within 1.5 standard deviations from the mean. Here's two standard deviations. How about 2.5 standard deviations? No, 425 is still not within that range.
And yes, we now have gotten below zero on the low end, about negative 36 pounds. Only when we get to three standard deviations does the last data point fall within our limits. I guess the next question is, what does this mean? Is any of this significant? Well, for data that would be considered symmetrical, which means that we have a nice bell-like distribution centered at the mean, it is estimated that about 68% of your data points should fall within one standard deviation of the mean.
We had 90% fall within one standard deviation. Better than expected. Most of our patients are probably somewhat similar. With our small data set, we can see that this is true. If we had a huge data set and 90% of our data was within one standard deviation from the mean, we'd probably feel pretty good. How about that last data point? Well, as I said, 68% of the data points within one standard deviation is considered normal.
How about for two standard deviations? Well, we would expect about 95% of our data points to be within two standard deviations of the mean. And for three standard deviations of the mean, 99.7%. And that is where our last data point lies. That would seem to be fairly extreme. So, that last data point is definitely what we would consider an outlier. As you can see, standard deviation can be a very helpful tool in understanding data sets and their individual data points.
More importantly, standard deviation will help us generate interesting questions about the data collection methods, the entire pool of data, and even the individual data points.
Professor Eddie Davila covers statistics basics, like calculating averages, medians, modes, and standard deviations. He shows how to use probability and distribution curves to inform decisions, and how to detect false positives and misleading data. Each concept is covered in simple language, with detailed examples that show how statistics are used in real-world scenarios from the worlds of business, sports, education, entertainment, and more. These techniques will help you understand your data, prove theories, and save time, money, and other valuable resources—all by understanding the numbers.
- Calculate mean and median for specific data sets.
- Explain how the mode is used to assess a data set.
- Identify situations in which standard deviation can be used to investigate individual data points.
- Use mean and standard deviation to find the Z-score for a data point.
- List the three different categories of probability.
- Analyze data to determine if two events are dependent or independent.
- Predict possible outcomes for a situation using basic permutation calculations.
- Give examples of binomial random variables.