Join Keith McCormick for an in-depth discussion in this video Building a quick CHAID model, part of Machine Learning & AI Foundations: Decision Trees.
- [Instructor] Let's build our first CHAID model. I'm going to use the open an existing stream icon to open Source_and_Type_Node.str. It's got just two nodes. The first node, our source node, has the location of the file. Now, remember that your path might be different than mine. Also remember that there's some important instructions that have been given here, but don't worry, I've done that for you. Next node is a type node. This also has some important instructions, but at the moment, I'm just going to focus on the variables that have been declared as input.
Those are the variables that we're going to be using to predict survival of the passengers. So the first one is Pclass. That stands for Passenger class, whether or not they're traveling in first, second or third class. Next is Sex, male or female, followed by Age, and the next two are interesting: this one is sibling, spouse, that's what that stands for, and the one after that is parent, child. So taken together, that indicates how many dependents that they might be traveling with.
The notion is, if you have to get a young child or an elderly parent to the lifeboats, that might impact your own individual survival. Next is the fare they paid for the ticket, and then finally embarkation. You may not know this, but it actually made three stops before crossing the Atlantic, so that's whether or not the passenger started their voyage in England, Ireland or France. Let's go back to our stream, and we have to add a very important node here. It's called the Partition node, and what this is going to do is segment our data into two partitions.
Let's take a look inside. When the data set is a bit on the small side, as it is here, it's a really good idea to increase the percentage of the training partition. I'm going to go up to 80%, and then lower the percentage for the testing partition. So that's going to be 20%. What the training partition is, is the data that we're going to use to actually build the model and then we're going to check it against the testing partition to make sure that the model does a good job on data that hasn't been used.
So let's return to the stream. We're going to add our CHAID modeling node now, and attach that and when we look inside, what we find out is that the Type node has done its job: it's automatically declared what variable will be the target and which of the variables will be the inputs and we're ready to run that. Our model's been created, just like that. We can take a look inside.
Specifically at the Viewer tab, and we can see that we've got this fairly complex model that's been built. Now, we're going to have lots of opportunity to take a look at tree diagrams like this, but now what we're going to do is, we're going to move on to the next step, and in the next movie, we're going to go ahead and build a cart model.
- Using the SPSS Modeler
- Building a CHAID model
- Adding a second model with C&RT
- Analysis notes
- Using a lift and gains chart
- Exploring algorithms
- Building a tree interactively
- The Bonferonni adjustment
- Handling nominal, ordinal, and continuous variables
- Examining the CHAID tree
- The Gini coefficient
- Weighing purity and balance
- Understanding pruning
- Examining the C&RT tree
- Applying stopping rules
- Using the Auto Classifier tuning trick