Even more Bayes Theorem. Using it to see the relationship between related data sets.
- Millionaires. Who are they? And how did they get their money? Well, let's say you were given this set of data. The amount of education millionaires in a certain country had achieved. 20% had graduate degrees. 60% undergraduate degrees. 15% had only a high school diploma. And 5% did not even complete high school. Suppose for this same country we also know how many inherited their wealth versus earning it.
Here's the data: 50% of millionaires that are high school drop-outs inherited their wealth. 20% of millionaires that had either a high school diploma or an undergraduate degree inherited their wealth. And, only 15% of millionaires with graduate degrees inherited their wealth. What is the probability that a millionaire did not complete high school given that they earned their wealth. Well, let's set up a tree.
Our first set of branches establishes the level of education. Our second set of branches then break up these different types of millionaires into those that inherited their wealth versus those that earned their wealth. By multiplying the value of the first branch times the value of the second branch, we can see the probability of each outcome. To simplify: let's say there are 1000 millionaires. We can multiply each of our branch values times 1000.
We can now see how many millionaires we have in each category. For example, there are 200 millionaires with graduate degrees, 170 earned their money, 30 inherited their money. We can also see that by isolating only those that earned their money, 795 of the 1000 millionaires earned their money. And of those 795, 25 did not graduate high school.
By dividing those two numbers we can now see that the probability a millionaire did not complete high school, given that this millionaire earned their money, is 3.1%. Some of you may be wondering is there a way to do this without the diagrams? There is. The formula for Bayes Theorem looks like this. A bit scary, I know, but logical once you insert the data for this problem. Probability of A given B. In this case, the probability of drop-out given earned money.
So our numerator is probability of drop-out, 5%, times probability drop-out earns money, 50%. The denominator looks ugly, but it's really just trying to add up all the people that earned the money. So here we multiply drop-out probability, 5%, by earn probability for drop-out, 50%. We add high school only probability, 15%, times earn probability for high school only, 80%.
Next we add undergraduate only probability, 60%, times earn probability for undergraduate only, 80%. And finally, we add graduate degree probability, 20%, times earn probability for graduate degree, 85%. This will give us 0.031, or 3.1%. The same thing we got using our probability trees. So whether you're looking at false positive data, crime data, educational data, science data, or even business data, Bayes Theorem can help you understand relationships and probabilities.
Professor Eddie Davila covers statistics basics, like calculating averages, medians, modes, and standard deviations. He shows how to use probability and distribution curves to inform decisions, and how to detect false positives and misleading data. Each concept is covered in simple language, with detailed examples that show how statistics are used in real-world scenarios from the worlds of business, sports, education, entertainment, and more. These techniques will help you understand your data, prove theories, and save time, money, and other valuable resources—all by understanding the numbers.
- Why statistics matter
- Evaluating your data sets
- Finding means, medians, and modes
- Calculating standard deviation
- Measuring distribution and relative position
- Understanding probability and multiple-event probability
- Describing permutations: the order of things
- Calculating discrete and continuous probability distributions