When is a data point too far from the mean? Is this odd or interesting?
- Nowadays we hear the term outlier quite a bit. Perhaps a certain athlete is incredibly talented and productive, much more so than any other competitor in the league. Their statistics far exceed those of any other single player. They might be labeled an outlier. Maybe one 10 year old in your city is taking a course in calculus. Or perhaps the opposite occurs, another 10 year old child struggles with basic addition. In an effort to center the discussion on the masses, educators may exclude these children as outliers.
But how about a child that scores a perfect score on a nationally standardized test for 10 year olds? Are they an outlier? Or how about our athlete? If they average 45 points per game in basketball, at what point is our star athlete an outlier? If no one else is above 40 points per game, is our star an outlier? If no one else is above 35 points per game? How about if our star player averages 45, the second best player scores 40 points per game, and the next best player is at 32 points per game? Do we have two outliers or do we not have any outliers? So what exactly is an outlier? The most common answer you'll get is that an outlier is a data point that is an abnormal distance from the other values in the data set.
This brings about a few questions. First, what's abnormal? There is no set definition, but I think it's important to understand that the term outlier is not a very specific term. So, it's less about absolutely identifying outliers, rather it's about motivating discussions of what is normal, about what is possible. Perhaps talking about outliers will help important issues surface. So, how can we identify outliers? Tables and charts can be useful.
Sometimes they make outliers stand out. Perhaps we use standard deviation. Maybe we say that anything more than two standard deviations from the mean is a statistical outlier. Sometimes an outlier is just something new, something that we've never even considered, which brings us to the next question. What should we do with outliers? Should we just throw them out, not consider them at all? In general, I'd say no. Most consider outliers as freaks or freakish events that are not likely to be seen again, as a result they're ignored, not considered worth investigating since they're so odd.
Instead though, they should be considered as opportunities. Are they the beginning of a new trend? Does this person know something that we don't? Is it possible others will learn from this outlier, and we might see a massive change in behavior? Was there a special circumstance for that particular person? Why did they do so poorly? Perhaps this person got very ill and had to leave the program. Or why did they perform so well? Did they get extra help? Did they have additional training? Should we consider extra training for everyone? As you encounter outliers in your data, at the office, or even in your daily life, ask good questions.
Is this really an outlier? How did this happen? What can we learn? What needs to change? A mass of closely distributed data points can be very instructive, but sometimes the lone outlier can provide us with a brand new perspective.
Released
9/18/2016Professor Eddie Davila covers statistics basics, like calculating averages, medians, modes, and standard deviations. He shows how to use probability and distribution curves to inform decisions, and how to detect false positives and misleading data. Each concept is covered in simple language, with detailed examples that show how statistics are used in real-world scenarios from the worlds of business, sports, education, entertainment, and more. These techniques will help you understand your data, prove theories, and save time, money, and other valuable resources—all by understanding the numbers.
- Calculate mean and median for specific data sets.
- Explain how the mode is used to assess a data set.
- Identify situations in which standard deviation can be used to investigate individual data points.
- Use mean and standard deviation to find the Z-score for a data point.
- List the three different categories of probability.
- Analyze data to determine if two events are dependent or independent.
- Predict possible outcomes for a situation using basic permutation calculations.
- Give examples of binomial random variables.
Share this video
Embed this video
Video: Outliers