An explanation of Bayes Theorem. How it is used to understand scenarios that include false postives?
- Perhaps you've heard a story like this. A person goes into a doctor's office. They have some odd symptoms so as a precaution the doctor takes a blood sample to see if the patient might have a rare disease. The test is sent out and upon their next visit, the patient is told they have tested positive for this rare disease. The patient is obviously very worried. The doctor tries to calm the patient by saying that the positive results, which indicate that the patient has the disease, may be incorrect.
The doctor's trying to indicate that it's possible that the test has delivered a false positive. I think any one of us would still be worried. I think I'd want a little bit more information. If I did test positive for this rare disease, what is the probability that my test results could have been wrong? Well to figure that out, we'd need some information. We're looking at two different events. Event A, patient has the disease. Event B, patient tests positive for the disease.
The doctor tells us that only one in 10,000 people has the disease. The doctor also tells the patient that if you have the infection, there's a 99% chance you will test positive for the disease. How about the uninfected? Well 2% of the uninfected patients will still test positive for the disease. In other words, 2% of the healthy patients will get a false positive. For this, let's use our probability trees.
Event one, does the patient have the disease? One person does, 9,999 do not. So our diseased branch has the value of 0.0001. Our healthy branch has a value of 0.9999. Then we can move on to event number two. Did the patient test positive? For those that actually have the disease, 99% test positive and 1% of patients with the actual disease will test negative.
For those patients that do not have the disease, 98% will test negative, 0.98, but 2% will test positive, 0.02. These are the folks that get false positives. Let's calculate the value of each branch. The value of diseased patients that test positive is 0.001 times .99, which gives us .000099.
The value of diseased patients that test negative is .0001 times .01, which gives us .0000001. This tiny number. The value of healthy patients that test positive is .999 times .98, which gives us .979902. The value of healthy patients that test negative is .999 times .02, which gives us .019998.
So what does this mean? Well in a city of one million people, 100 people will actually have the disease. Of those 100 people, 99 will test positive and one person will get a negative test and not discover that they have the disease. It also means that out of one million people, 999,900 people will not have the disease, but out of those 999,900 people who are not diseased, 19,998 will get a positive test.
So finally, back to our question. If I test positive, how worried should I be? Well out of one million people, 20,097 will test positive, but only 99 of those people will actually have the disease. So if you test positive, there is a 0.5% chance you actually have the disease. In other words, only one out of every 200 people that test positive for the disease actually has the disease.
That's got to make that patient feel at least a little bit better, right? What we did here is the basis for what we call Bayes theorem. It's not only interesting, but it can be very useful. So in our next video, let's look at one more problem and get a better look at the mechanics of what the Reverend Thomas Bayes did with the use of conditional probability.
Professor Eddie Davila covers statistics basics, like calculating averages, medians, modes, and standard deviations. He shows how to use probability and distribution curves to inform decisions, and how to detect false positives and misleading data. Each concept is covered in simple language, with detailed examples that show how statistics are used in real-world scenarios from the worlds of business, sports, education, entertainment, and more. These techniques will help you understand your data, prove theories, and save time, money, and other valuable resources—all by understanding the numbers.
- Calculate mean and median for specific data sets.
- Explain how the mode is used to assess a data set.
- Identify situations in which standard deviation can be used to investigate individual data points.
- Use mean and standard deviation to find the Z-score for a data point.
- List the three different categories of probability.
- Analyze data to determine if two events are dependent or independent.
- Predict possible outcomes for a situation using basic permutation calculations.
- Give examples of binomial random variables.