From the course: Machine Learning and AI Foundations: Decision Trees with SPSS

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

How does C&RT weigh purity and balance?

How does C&RT weigh purity and balance?

From the course: Machine Learning and AI Foundations: Decision Trees with SPSS

Start my 1-month free trial

How does C&RT weigh purity and balance?

- C&rt weighs two factors equally, purity and balance. Purity is typically measured with a variation of the Gini coefficient. Balance is simply the left branch and the right branch having the same or similar number of cases. C&RT always produces binary splits, meaning that it always splits into two. Let's take a look at an example. This is the age variable from the Titanic data set. This split shows where C&RT wants to split. Note that it's not terribly balanced. However, purity has changed. The root node shows roughly two thirds blue and one third red. Remember that red is survival here. The leaf node on the right is similar, but the leaf node on the left has moved in a new direction, showing nearly two-thirds red. This would favor age as a potential predictor. It is showing a sharp contrast between the very young and everyone else. If this process were to continue, we'd eventually end up with leaf nodes that were much more pure than the root node, which, of course, is what we want…

Contents