Revisit Statistics 1 foundations: means, medians, ranges, standard deviations, normal distributions, the empirical rule, and z-scores.
- Before we jump into Statistics Fundamentals 2, let's make sure we remember the key components from Statistics Fundamentals 1. Let's start at the very beginning. Statistics is about trying to understand a situation. It's about taking pools of data and organizing those numbers. The organization of the numbers can often tell a story. How do we organize the data? Sometimes, just by putting your data points in a table. Other times, we need a graph or a chart to help us understand the data.
Why do we organize this data? Often we do it to help us make better decisions or at least so we can ask smarter questions. And while tables and charts are helpful, often they're not enough. A data pool of 10,000 data points would give us tables and charts that might be too overwhelming to provide the guidance we seek. Often we want to understand the data set by knowing its center. There are several ways to determine the center of a data set. One way is to determine the mean, otherwise known as the average.
An average of all 10,000 data points. We might also consider the median. If we listed all 10,000 values, from the smallest to the largest value, the median would be the one right in the middle. Knowing the central point is helpful, but perhaps you want to know how the 10,000 values are distributed. For example, you might want to know the range, the difference between the biggest and smallest value in a data set. Perhaps you want a standard deviation.
The standard deviation is sort of the average distance between each data point and the mean of the data set. It is essentially a measure of how much variation exist between the data points in our pool. Let's set this up just a bit. Do you remember the concept of a normal distribution? Remember, a data pool that is normally distributed would mean that your data is symmetrically distributed around the data pool's mean.
That's where we get our very pretty and also very helpful normal distribution curve. These distribution curves tell us how data is distributed, how many data points are at the mean, how many are located at this reading, and how many we would expect at this reading. So, now let's pair this up with the concept of standard deviation. How are the concepts of standard deviation in the normal distribution curve related? Well hopefully you remember that the empirical rule tells us that we expect about 68% of our data to be within one standard deviation of the mean.
We would then expect 95% of our data points to be within two standard deviations of the mean. And finally, 99.7% of our data points should be within three standard deviations from the mean, which brings us to Z-Scores. Z-Scores are a measure of the number of standard deviations a particular data point is from the mean. So for a data point we will call X, a Z-Score of 1.55 means that the X is 1.55 standard deviations from the mean.
Data sets, tables, charts, means, medians, ranges, standard deviations, normal distributions, and Z-Scores. Do you remember these concepts? Do you know what they mean? Do you know how to find or calculate some of these statistics? If so, good job. If not, perhaps a quick look at these concepts in Statistics Fundamentals 1 might be needed. These concepts will follow us all the way through pretty much every installment of Statistics Fundamentals.
So you really want to make sure you understand those items before you tackle Statistics Fundamentals 2. Is that all you need to remember? Not quite.
Eddie Davila first provides a bridge from Part 1, reviewing introductory concepts such as data and probability, and then moves into the topics of sampling, random samples, sample sizes, sampling error and trustworthiness, the central unit theorem, t-distribution, confidence intervals (including explaining unexpected outcomes), and hypothesis testing. This course is a must for those working in data science, business, and business analytics—or anyone else who wants to go beyond means and medians and gain a deeper understanding of how statistics work in the real world.
- Data and distributions
- Sample size considerations
- Random sampling
- Confidence intervals
- Hypothesis testing