Join Keith McCormick for an in-depth discussion in this video Overview, part of Machine Learning: Advanced Decision Trees.
- [Instructor] Let's overview the QUEST algorithm. Now remember, we're focused on the algorithm, not any particular software implementation. QUEST is an acronym that stands for Quick, Unbiased, and Efficient Statistical Tree. So what were the co-authors thinking when they came up with this acronym? Well, a little bit of history is helpful. CHAID, Chi-square Automatic Interaction Detection, came out in 1980. CART, Classification And Regression Trees, came out in 1984.
So those algorithms were well known when QUEST came out in 1997. So what were they trying to improve upon? Well, a perceived weakness of CART was that it was slow. The reason was, is that, CART examines all possible split points. And as we'll see, QUEST doesn't do it that way. Also, CHAID was perceived to be biased towards branches with a large number of child nodes. So what this means is that CHAID often would gravitate towards categorical variables with lots of categories or grow trees that were somewhat wider than other techniques.
So how does QUEST do it? QUEST uses statistical tests instead of a brute force search for all possible cut points. So it examines fewer cut points, but it does so by performing calculations that try to zero in on what that optimal cut point would be. It also uses different tests appropriate to different variable types. So, it uses Chi-square on categorical variables but it uses F-test on scale variables, as we'll see.
Once those tests are performed, it can simply rank all of the variables in the data set by their p-values. Finally, QUEST uses surrogates for missing data, just like CART.
- Understanding QUEST functions and applications
- C5.0 concepts and practical applications
- Understanding information gain
- Random forests
- Boosting and bagging
- Costs and priors