Join Keith McCormick for an in-depth discussion in this video What you should know, part of Machine Learning: Decision Trees.
- [Instructor] If you've ever taken a class or read a book about statistics or data mining or maybe even written code in R, but you never really understood what it was actually doing behind the scenes, this class will be a good fit for you. It's not about the point and click or the programming commands, but rather how the algorithms work, how to interpret the results, and how I act on those results to improve the model. So if you're interested in what trees are doing under the hood, this is going to be a good course for you. On occasion, I'll mention statistics terms like type one error.
If you don't recognize a term like that, don't worry. I'll always explain terms like this when they come up. There won't be a lot of heavy math. Many books on the subject are filled with a lot of sigmas and alphas and betas. There's a place for that, but we'll have a different focus. We'll explain the concepts thoroughly, but won't get bogged down with a lot of formulas. I've allowed the software to fade into the background. It's not the focus of the course. It's just a tool we use to experience the techniques, manipulate the settings, to get different results.
If you've ever used SPSS Modeler before or if you have a reason to learn it, my use of it might be a nice bonus for you. But you can get a lot of benefit out of the course even if you intend to use other software options to build decision trees. Let me show you just a couple of quick examples. This is a KNIME stream to build a decision tree. KNIME is a very popular open source software, and once the tree is built, it looks like this. It has a slightly different look and feel to SPSS Modeler but structurally it's exactly the same.
Here's what the Titanic data looks like in SPSS Statistics. SPSS Statistics is on virtually every university campus and widely used software. You can also build decision trees in it. Here's an example of a decision tree built in SPSS Statistics and it looks almost identical to the one in SPSS Modeler. In addition to those options, you have R, you have Python, you have SAS, there are many. There's truly no predictive modeling technique that is more common or more foundational than decision trees.
It's a fantastic way to start a journey in understanding a whole variety of topics in data science.
- Using the SPSS Modeler
- Building a CHAID model
- Adding a second model with C&RT
- Analysis notes
- Using a lift and gains chart
- Exploring algorithms
- Building a tree interactively
- The Bonferonni adjustment
- Handling nominal, ordinal, and continuous variables
- Examining the CHAID tree
- The Gini coefficient
- Weighing purity and balance
- Understanding pruning
- Examining the C&RT tree
- Applying stopping rules
- Using the Auto Classifier tuning trick