Join Keith McCormick for an in-depth discussion in this video How CHAID handles continuous variables, part of Machine Learning & AI Foundations: Decision Trees.
- [Instructor] Our continuous variable will be age. CHAID actually starts by converting age into deciles. It has to do this because chi square can't be run on continuous variables, yet in this case we get only two notes. So what's going on? It seems like age, 13, is an important cut off, It seems like age, 13, is an important cut off, which certainly is consistent with this notion of women and children first. So how does CHAID do it? When we force it into more segments, we get a better sense of what the deciles must have looked like.
In fact if we take a closer look at the sample size we can see that node 14 has 41 records we can see that node 14 has 41 records out of a total in this particular tree of 500. So about the size we would expect a decile to be. By the way deciles aren't going to be exactly 50 because we're talking about integers here and for that reason you're not going to get that exact number. If we move on to node 15, we see that that has 48. Also about what we expect the decile to be.
And we notice the same for 16 and 17. Until we get to node 18, where the sample size is substantially larger at 240. is substantially larger at 240. So it seems like node 18 must represent 5 deciles. So it seems like node 18 must represent 5 deciles. So why did CHAID break at 13? So why did CHAID break at 13? Well take a look at the survival rate for node 15, 16, 17, and 18. for node 15, 16, 17, and 18. It's around 40 percent. Now there is this little dip at node 16.
It drops down to 25. But because of sample size and other factors CHAID didn't want to break it up into that many deciles so there you go we get node 14 with its high survival rate and then nodes 15, 16, 17, and 18 and then missing gets combined with the group that its most like. And that's how we got the pattern that we originally saw, just two nodes, less than or equal to 13, compared to 13 and older plus missing.
In the next video we're going to get a chance to look at the whole tree.
- Using the SPSS Modeler
- Building a CHAID model
- Adding a second model with C&RT
- Analysis notes
- Using a lift and gains chart
- Exploring algorithms
- Building a tree interactively
- The Bonferonni adjustment
- Handling nominal, ordinal, and continuous variables
- Examining the CHAID tree
- The Gini coefficient
- Weighing purity and balance
- Understanding pruning
- Examining the C&RT tree
- Applying stopping rules
- Using the Auto Classifier tuning trick