Join Curt Frye for an in-depth discussion in this video Analyzing a sample problem: Kahneman's Cabs, part of Data-Analysis Fundamentals with Excel.
- If you haven't worked with Bayesian analysis before, dealing with prior probabilities and base rates, then the concept can be a little bit unclear. I'd like to work through a classic example, the Tverski and Kahneman taxicab problem, to demonstrate how it works and give you a better intuition. We'll start with the following facts: a cab was involved in a hit and run accident at night, and two cab companies, Green and Blue, operate in the city.
Here are the facts as given in the court case: a witness identified the cab as Blue, 85% of the cabs in the city are Green and 15% are Blue. That means that we have a base rate for Blue of 15%, and a base rate for Green of 85%. The court examined the witness and found that, under the circumstances that existed at the night of the accident, they identified Green versus Blue correctly 80% of the time.
So that means 80% of the time, if the cab is Green, they will say Green, if it is Blue, the will say Blue, and the other 20% of the time, they will be wrong. The question for you is, given those facts, what is the probability that the cab involved in the accident was Blue rather than Green? I would encourage you to pause the movie and take a moment to think about what your intuition tells you is the probability the cab is actually Blue rather than Green? And then press Play, after you've had a chance to think about it.
It turns out, the actual probability is 41%. You can go online and find a very nice intuitive explanation of why that's the case at the URL that I've displayed here on the slide. I'd like to work through Sagar's explanation to give you an idea of how it worked here. Let's visualize the answer. We have our base rates. 85% of the cabs in the city are Green and 15% are Blue. Now we need to include our accuracy rate.
If a cab is Green, the witness will say it is 80% of the time, but they will say Blue, that is, be wrong, 20% of the time. And a similar case is for a Blue cab, if it is actually Blue, they will be right 80% of the time. But the other 20% of the time, they will be wrong and indicate that a Blue cab is actually Green. Now we need to look at the individual probabilities for each outcome. So, 69% of the time, a cab that is Green will be correctly identified as Green.
That calculation is performed by taking the 85% base rate and multiplying it by the 80% accuracy rate to get the 69%. The other 20% of the time, a Green cab will be identified as Blue, and that outcome happens 85% times 20%, or 17% overall. You can see the similar calculations for Blue cabs coming in at Blue identified as Blue 12% of the time, and Blue identified incorrectly as Green 3% of the time.
Now, if you look at the values, none of them add up to 41. None of them appear to multiply to 41%. So how do we reach that number? What we do is, we take the two middle numbers, where the cab is identified as being blue ... Because remember, that's what the witness did. The witness said that the cab was Blue. So how many times does that occur? Well, if the cab is Green, it will be identified as Blue 17% of the time. 12% of the time, it will be identified as Blue when it really is Blue.
So what we need to do now is to divide the chance that a Blue cab is identified by Blue by the total number of observations. So we multiply the 12% of a cab being Blue identified as Blue versus the total number of observations. And again, that's 12% plus 17%. And 12 divided by 12 plus 17 equals 41%. And that is how you arrive at your answer.
Lynda.com is a PMI Registered Education Provider. This course qualifies for professional development units (PDUs). To view the activity and PDU details for this course, click here.
The PMI Registered Education Provider logo is a registered mark of the Project Management Institute, Inc.
- Calculating mean and median values
- Analyzing data using variance and standard deviation
- Minimizing errors
- Visualizing data with histograms, charts, and more
- Testing hypotheses
- Measuring covariance and correlation
- Performing Bayesian analysis