Join Keith McCormick for an in-depth discussion in this video Exhaustive CHAID, part of Machine Learning & AI Foundations: Decision Trees.
- [Instructor] There's a variation of CHAID that we haven't had an opportunity to talk about yet. David Biggs proposed Exhaustive CHAID back in 1991, and it's widely available, so let's take a couple of moments to talk about how Exhaustive CHAID is different, and what impact it might have on the tree. First, let's take a moment to remind ourselves of what the original CHAID looked like. Let's count the levels. We've got one, two, three, four, five levels.
What makes CHAID and Exhaustive CHAID different is how they merge. So let's take a closer look at the portion of the tree with the female passengers, and in particular, look at the Fare variable. Let's remind ourselves of how CHAID would handle this variable. It would begin by breaking it into deciles. Then it's going to find the pair that is not significantly different, in fact it rank orders them, and it's going to combine them. Then it finds the next pair to combine, and the next pair to combine.
When there's no more pairs to combine, it stops, and that's how we got four. Exhaustive CHAID is different in exactly this part of the process. Even when it gets down to these four, it keeps going. It keeps collapsing beyond that point, and it keeps track of which of those combinations is going to do the best job at predicting survival. So it's working harder, it's trying more combinations, and again, that always, historically, has made it a little slower as well.
Thus the name, Exhaustive CHAID. Let's take a look at the entire Exhaustive CHAID tree. All other settings are the same. The data's the same. But let's count together. One, two, three levels. So clearly something about that is different. But Exhaustive CHAID doesn't split differently, it merges differently. Let's take a little bit of a look at Fare again. We don't seem to notice a difference, so what could be going on? Well, we know that Exhaustive CHAID is working harder, it's doing more tests.
And what have we talked about that involved the number of tests that we're performing? It's the bonferroni adjustment. If you adjust for more tests, the whole tree becomes more conservative. Let's take a closer look. If we look in particular at the section of the tree for male passengers, we find, if you look very closely, I'm not going to compare them in detail now, but if you look very closely at a variable like Embarked, the adjusted p value is going to be a little bit different for Exhaustive CHAID, and a little bit higher because it's adjusting for more tests.
It makes the tree more conservative, and it grows less. That's exactly the kind of little puzzle that you can figure out when you're spending several days building trees and you're trying to figure out why the trees are behaving in different ways. That's why it's so helpful to know even just a little bit of how the algorithms work under the hood.
- Using the SPSS Modeler
- Building a CHAID model
- Adding a second model with C&RT
- Analysis notes
- Using a lift and gains chart
- Exploring algorithms
- Building a tree interactively
- The Bonferonni adjustment
- Handling nominal, ordinal, and continuous variables
- Examining the CHAID tree
- The Gini coefficient
- Weighing purity and balance
- Understanding pruning
- Examining the C&RT tree
- Applying stopping rules
- Using the Auto Classifier tuning trick