Join Keith McCormick for an in-depth discussion in this video Decision tree options in SPSS Modeler, part of Machine Learning: Decision Trees.
- [Instructor] Let's take a look at the SPSS Modeler interface. Modeler's all about building Streams. Here's an example of a Stream that builds to Decision Trees and we draw these Streams on the canvas and down at the bottom, there's a whole collection of these different shapes called Nodes that we use to build the Streams. So let's take a closer look at modeling, and especially classification. And you can see that there's a very large number of choices here, but we're going to stay focused on just two kinds of Decision Trees, the CHAID algorithm and the CRT algorithm.
Let's take a look at a definition of Decision Tree Learning. Decision Tree Learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the item's target value. Let's take a look at a definition of Decision Tree Learning. Decision Tree Learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the item's target value. So in our case, the target is individual passengers' survival or not on the Titanic.
So we're trying to predict who's going to survive and who's not going to survive. So let's take a closer look at a CHAID model that's already been built and I can look inside by double-clicking on the icon. And one of the things that you'll notice is that these trees can get fairly large and complicated. In fact, many trees will get larger and more complicated than this one. Now something to note is that although the colors are arbitrary, in this case, red means they survived and blue means they died.
So let's zoom in and take a closer look. In particular, you'll notice that Node 4 down here represents a group of passengers where the survival rate is quite high compared to some of the others. We can actually look at this in a numerical way by calling up the numbers and what we'll find is that up at the top, we have our overall survival rate of 38.5%, but females in second class represented by Node 4, have a survival rate that's much higher, 91%.
Now we're going to have many opportunities to talk about tree diagrams just like this. So why so many algorithms and why are we going to stay focused on CHAID and CRT? I've chosen CHAID and CRT because they're common, they're easy to understand and quite different from each other. In fact, CHAID for instance, comes from the Statistics field whereas CRT is associated more with machine learning. So you can learn a lot about Decision Trees just by comparing CHAID and CRT.
- Using the SPSS Modeler
- Building a CHAID model
- Adding a second model with C&RT
- Analysis notes
- Using a lift and gains chart
- Exploring algorithms
- Building a tree interactively
- The Bonferonni adjustment
- Handling nominal, ordinal, and continuous variables
- Examining the CHAID tree
- The Gini coefficient
- Weighing purity and balance
- Understanding pruning
- Examining the C&RT tree
- Applying stopping rules
- Using the Auto Classifier tuning trick