Establish a strong foundation in machine learning by exploring the IBM SPSS Modeler and learning about CHAID, C&RT, and how to improve your model. This course is designed to help expand your data science skills.
- [Instructor] Hi my name's Keith McCormick. And I'd like to welcome you to Machine Learning Essentials: Decision Trees. I'm an independent consultant and I've been working in the areas of statistics and data mining for about 25 years now. In this class we're going to be learning about one of the most common types of predictive analytics models, decision trees. More than two third of all the projects that I've done over these many years have involved decision trees at one point or another. There are literally dozens of software packages and programming languages that allow you to build decision trees.
I will be demonstrating the techniques using IBM SPSS Modeler, but you don't have to be a user of that software to benefit from the class. And you don't need experience with it before the class. Our strategy will be to explore two of the most common and easily available methods for building decision trees, CHAID and CART. CHAID and CART. They offer an interesting study of contrast so along the way you'll learn a lot about how they work under the hood. In order to stay focused on the techniques and not get bogged down in a complicated data set, I want to keep it simple.
I've chosen the Titanic data set, which has become a bit famous as a way to practice decision trees. It has the passenger characteristics like age, gender, and passenger class. We'll be using these variables to predict which passenger characteristics are most associated with surviving the disaster. It's easy to understand and it's a great way to reveal the differences between CHAID and CART. I'm excited to share with you some of my favorite insights about the topic. Let's get started.
- Using the SPSS Modeler
- Building a CHAID model
- Adding a second model with C&RT
- Analysis notes
- Using a lift and gains chart
- Exploring algorithms
- Building a tree interactively
- The Bonferonni adjustment
- Handling nominal, ordinal, and continuous variables
- Examining the CHAID tree
- The Gini coefficient
- Weighing purity and balance
- Understanding pruning
- Examining the C&RT tree
- Applying stopping rules
- Using the Auto Classifier tuning trick