What's the difference between a t-statistic and a z-statistic? Learn about the relationship between small sample sizes and the t-statistic.
- Up until now, in our Statistics Fundamentals series, we've used the z-score to help us identify how many standard deviations a data point might lie from the population mean. It's also been very helpful in developing confidence intervals. Now remember, the z-score requires that our data be normally distributed. It also requires that we know the standard deviation of the population. The central limit theorem tells us that given enough iterations, the mean of our sample will be normally distributed.
But often, the population standard deviation is unknown. So how can we create confidence intervals when the population standard deviation is unknown? Believe it or not, you can use the standard deviation of a single sample. But if you have only one sample with a sample size under 30, a relatively small sample size, you can probably guess that your confidence interval will suffer, and this is why the z-score is not valid in this situation.
If you're creating a confidence interval when the population variance is unknown, you must instead use something called the t-distribution. Before we discuss the differences between the z- and t-distributions, let's first discuss how they are similar. Both are symmetrical, bell-shaped distributions. Both require data with a normal distribution. And in both cases, the area under the curve is equal to 1.0.
So how is the t-test different from our z-test? Well, the z-test is mostly used to compare the mean of a sample to its larger population. The sample comes from the population, so the means of the sample and population are intertwined. On the other hand, the t-test compares two completely independent samples. They don't have to come from the same population. So because of these differences, and also because of the small sample size, the t-distribution isn't one curve but rather a series of curves.
Each curve is representative of the distribution for different sample sizes. The smaller the sample size, the flatter the curve. The larger the sample size, the closer it gets to the z-distribution, which we use for the standard normal curve. Since all of the t-distribution curves are flatter than the z-distribution curve, the critical scores for t-distributions are higher than those for z-distributions. You might remember that the appropriate z-score for a 95% confidence interval was 1.96.
That's 1.96 standard deviations. How does that compare to t-scores? Well, it depends on the sample size. For a sample size of three, the t-score is 4.303. For a sample size of 10, the t-score is 2.262. For a sample size of 20, the t-score is 2.093. And by the time our sample size is equal to a hundred, our t-score goes to 1.98.
As you can see, the larger our sample size, the closer the t-score is to the z-score of 1.96. So where do we get t-scores for all of the different possible sample sizes? Let's take a look at that in our next video.
- Working with small sample sizes
- Using t-statistic vs. z-statistic
- Calculating confidence intervals with t-scores
- Comparing two populations (proportions)
- Comparing two population means
- Chi-square testing
- ANOVA testing
- Regression testing