One of the most common measures of data variability. What is it? How is it calculated?
- Knowing the difference between our biggest and smallest data points, which we call our range, is often interesting, but it can be deceiving when you have even one rogue data point. So how can we get a better overall feel for the distribution of our data points? Often statisticians turn to a handy tool called the standard deviation. It's a fairly common term in the world of statistics. But still, the concept can be intimidating to many. So what exactly is the standard deviation? Well it's sort of the average distance from the mean.
Let's look at these three data sets. All are very different. Still, all three have an average of 18. The first data set's range indicates that this is clearly a data set of similar numbers, but data sets two and three have identical means and ranges. Here, standard deviation is useful. While both have large and small data points, the standard deviation of data set three tells us the numbers in data set three are more similar to each other than those in data set two.
Notice we keep seeing a similar pattern. Often it takes a collection of basic statistics tools to get a sense of what a data set contains. So how do we calculate a standard deviation for a data set? Remember when I said it was sort of an average distance from the mean? Well I meant it. It's sort of the average distance from the mean, but not quite. It's the average square distance from the mean. Here's the formula.
Yeah kind of ugly, but don't worry, most calculators and spreadsheets are capable of doing all the work for you. Plug in the values of the data set and the machine does all the work. So finding a standard deviation doesn't need to be so difficult. Still for the curious, let's go ahead and show you how to use this formula. This can get ugly so let's start out with a small data set. First, we need the mean. That's easy enough. Next, we need the sample size minus one.
So now our formula looks like this. The last components of the formula are the individual data points. So for the first data point, take two minus eight. Negative six squared is 36. The sigma tells us we need to do this for every data point and then add all of those values. Next, we divide by three. This value is our variance for the data set.
When we take the square root, we find our standard deviation. So for this data set, we find we had an average of eight and now we know the standard deviations is 4.32. Standard deviation, you now have an idea of what it is and how it's calculated, but I'm guessing you still have questions. What's a good standard deviation? How do you use a standard deviation? Let's tackle questions like that in the next video.
Professor Eddie Davila covers statistics basics, like calculating averages, medians, modes, and standard deviations. He shows how to use probability and distribution curves to inform decisions, and how to detect false positives and misleading data. Each concept is covered in simple language, with detailed examples that show how statistics are used in real-world scenarios from the worlds of business, sports, education, entertainment, and more. These techniques will help you understand your data, prove theories, and save time, money, and other valuable resources—all by understanding the numbers.
- Calculate mean and median for specific data sets.
- Explain how the mode is used to assess a data set.
- Identify situations in which standard deviation can be used to investigate individual data points.
- Use mean and standard deviation to find the Z-score for a data point.
- List the three different categories of probability.
- Analyze data to determine if two events are dependent or independent.
- Predict possible outcomes for a situation using basic permutation calculations.
- Give examples of binomial random variables.