Understanding data sets and their limitations. Knowing what to look for in a data set.
- A runner's 100 meter dash time, a patient's heart rate, the number of items a customer purchased at the grocery store, the number of students that failed an exam, a customer's age, an employee's annual salary, these are a mix of observations and measurements. Each represents a tiny bit of data. On their own they may tell us a tiny bit about each person that was measured or observed, but if we can get the sex, age, home address, sales data and browser statistics for the 20,000 customers that bought products from our website last month, now we have a rather significant pool of data.
This massive pool of observations and measurements it probably has hidden among it lessons in how to better organize our website and our warehouse. It could probably tell us how to more efficiently and effectively reach our customers, but in its disorganized state this massive pool of data can't teach us very much. The lessons only become known when the data is organized, when it is made visual through charts, and when it has been processed with the right formulas.
Once we have organized data, we can begin to discover useful facts. Once we have organized data, we have information. How valuable is that information? Well, it depends on many different things. Chief among them, though, is the quality of the data itself. Before we begin to put a value on the information that was calculated and presented, we need to question how the data was gathered. In some cases, the data is just handed to us.
We ourselves don't actually collect the data. This secondary data was collected by others. In some cases their methods were good and sound. We like the data. In other cases there may have been some flaws in their methodology. Or maybe something they did makes it harder for us to feel secure in our calculations and conclusions. When we actually collect the data though, we can actually observe the subjects in their natural environment, or in the very strict experiment we set up.
We also get to write the very specific questions on their surveys. Plus, we are the ones that will moderate the discussions in a focus group. It's not to say that our data won't be flawed, that our questions and observations won't be biased, but at least we know we are the ones to blame for the flaws in the data. And, perhaps, by knowing the flaws we can more accurately report the level of uncertainty in our conclusions. And we can also establish how future data gathering might be modified.
So, just as in our daily diet, you are what you eat. If you eat food made of healthy ingredients, you have a better chance at improved health. The same holds in statistics. When our data pools are gathered through healthy methods our data pools have a better chance of containing truly valuable measurements and observations that can better inform us in the decisions we make. Next time someone provides you with some interesting or surprising statistics, perhaps you should take some time and ask about how the data was collected, what might be the flaws in the data, and how that might change our opinion of the results.
Released
9/18/2016Professor Eddie Davila covers statistics basics, like calculating averages, medians, modes, and standard deviations. He shows how to use probability and distribution curves to inform decisions, and how to detect false positives and misleading data. Each concept is covered in simple language, with detailed examples that show how statistics are used in real-world scenarios from the worlds of business, sports, education, entertainment, and more. These techniques will help you understand your data, prove theories, and save time, money, and other valuable resources—all by understanding the numbers.
- Calculate mean and median for specific data sets.
- Explain how the mode is used to assess a data set.
- Identify situations in which standard deviation can be used to investigate individual data points.
- Use mean and standard deviation to find the Z-score for a data point.
- List the three different categories of probability.
- Analyze data to determine if two events are dependent or independent.
- Predict possible outcomes for a situation using basic permutation calculations.
- Give examples of binomial random variables.
Share this video
Embed this video
Video: Is my data set good?