The world is too big—a statistician can't measure everything. Instead, they use samples. What is required to have a useful sample? Learn about size, quality, and selection techniques.
- No matter your field, you're constantly searching for answers, answers to questions like what's happening now, what'll happen next, where do we need to improve, what do our people think, what's normal? When you have those answers, you can make better decisions about how to use your time and resources. Perhaps, you can feel at ease or at least you can work to isolate your problems and then fix them. The challenge is getting the right answers, especially when the world, even your small slice of it is very big.
Measuring everything is just way too expensive, too time consuming and in some cases, it's just impossible. Political operatives can't poll every voter. Cell phone companies can't measure the quality level of every single item the produce. A farmer can't measure the actual size of every tomato grown. Scientists, they can't track the health of every single person in the country. Instead of measuring everything, they just measure a small group or subset of the total population.
That small subset of measurements is a sample. And under the right circumstances, this sample can act as a representative of the entire population. Gathering that representative sample is challenging though. Let's consider a political poll for the mayor of the city. The city has a population of one million eligible voters. A polling organization is trying to predict the election outcome between two candidates.
One named Silver and one named Diamond. The polling organization reports that if the election were held today, Diamond would get 60% of the vote and Silver would get 40% of the vote. You work on the Silver campaign, so you're concerned. Before you panic though, you need to question the quality of the sample used. What are probably some of your biggest concerns about the sample? How many of the one million eligible voters were polled? Would a hundred be enough, how about a thousand? Or would you want a sample size of at least a hundred thousand? The bigger the required sample size, the more expensive the survey is to conduct.
How are those people chosen, who actually decided to answer the survey, did some people decline to be surveyed? Which organization did the polling? Did they intentionally, or perhaps unintentionally collect data that was not representative of the population? Did they drive the poll to give favorable results for one particular candidate? What specifically were the questions asked in the survey? Were they confusing or misleading? And depending on the nature of the study, political, scientific, environmental, commercial or entertainment related, there are likely many other factors that should be considered in evaluating the quality of a sample.
Yes, the size of a sample is important in determining the worth of the data collected. But before we approach the concept of sample size, let's consider the other aspects of gathering a quality sample. Strangely enough, despite the endless list of sample considerations, the best samples, they're the ones that are chosen at random. Yep, a random sample is the gold standard when it comes to collecting data.
But as you might now expect in the world of statistics, nothing comes easy. And so before we move forward, we confront the difficulty of gathering a truly random sample.
Eddie Davila first provides a bridge from Part 1, reviewing introductory concepts such as data and probability, and then moves into the topics of sampling, random samples, sample sizes, sampling error and trustworthiness, the central unit theorem, t-distribution, confidence intervals (including explaining unexpected outcomes), and hypothesis testing. This course is a must for those working in data science, business, and business analytics—or anyone else who wants to go beyond means and medians and gain a deeper understanding of how statistics work in the real world.
- Data and distributions
- Sample size considerations
- Random sampling
- Confidence intervals
- Hypothesis testing