One, two, or three standard deviations? The answer to this question tells us something about our data points/sets.
- The empirical rule, that sounds a bit scary, doesn't it? But is it really that intimidating? It's not. Actually, it is quite simple and in the end, very useful for understanding the distribution of data points in data sets. So, what is this empirical rule? First, it's important to note that this rule works for symmetrically distributed data, often illustrated by a bell-shaped curve centered on the data set's mean. Once we understand this, the empirical rule explains how this symmetrically distributed data follows a pattern whereby most of the data points fall within three standard deviations of the mean.
For this reason, it is also sometimes referred to as three sigma rule, where sigma stands for standard deviation. The rule goes further though. It explains that about 68% of all the data points will lie within one standard deviation of the mean. Notice how this is illustrated on our bell-shaped curve. The empirical rule then goes on to say that 95% of all data points fall within two standard deviations of the mean.
Again, notice how this is illustrated in our bell-shaped curve. Finally, the empirical rule tells us that when you have the bell-shaped curve, often referred to as a normal distribution, 99.7% of the data points in the data set will fall within three standard deviations. So, as you can see, now almost all of the area under that bell-shaped curve has been accounted for. One very important note, this works when we have the well-centered, symmetrical bell-shaped curve.
The rule begins to lose value the farther our data set strays from the classic normal distribution. That said, in most cases, the 68-95-99.7 rule, which we call the empirical rule, holds up pretty well. With this knowledge, you can, hopefully, better evaluate data points. If someone says that a data point has a Z-score of 1.8, you know it's within two standard deviations of the mean, and thus it is likely among 95% of all the data points in the data set.
That said, if something had a Z-score above 3.0, we could be pretty confident that this data point was a true outlier, since it is likely not among 99.7% of all the data in our data set. In terms of our bell-shaped curve, this data point would likely be out of this region. No matter if you call it the empirical rule, the three sigma rule, or the 68-95-99.7 rule, we now better understand the normal distribution of data in a way that allows us to better understand our data set and the data points that lie within that data set.
Released
9/18/2016Professor Eddie Davila covers statistics basics, like calculating averages, medians, modes, and standard deviations. He shows how to use probability and distribution curves to inform decisions, and how to detect false positives and misleading data. Each concept is covered in simple language, with detailed examples that show how statistics are used in real-world scenarios from the worlds of business, sports, education, entertainment, and more. These techniques will help you understand your data, prove theories, and save time, money, and other valuable resources—all by understanding the numbers.
- Calculate mean and median for specific data sets.
- Explain how the mode is used to assess a data set.
- Identify situations in which standard deviation can be used to investigate individual data points.
- Use mean and standard deviation to find the Z-score for a data point.
- List the three different categories of probability.
- Analyze data to determine if two events are dependent or independent.
- Predict possible outcomes for a situation using basic permutation calculations.
- Give examples of binomial random variables.
Share this video
Embed this video
Video: Empirical rule: What symmetry tells us