Join Keith McCormick for an in-depth discussion in this video Using the exercise files, part of Machine Learning: Decision Trees.
- [Instructor] I've provided a handful of files for you in the Exercise Files folder. You won't need these for every video so I'll refer to them when you need them. The Train.csv file is our practice data involving the passengers of the Titanic. I'm going to show you how to download that now. You can download it from a website called Kaggle.com. So if you simply search for keywords 'kaggle titanic', you'll find it and let's go to the page. You will have to sign up for Kaggle but of course it's completely free data and you may actually find some of the supporting information interesting.
If you click on Get the Data, the only file that we need is this initial file 'train', the csv file. If you want to click along, and I encourage you to do so, you'll want to get a copy of the IBM SPSS model or trial. You will have to get an IBM ID but it's completely free and the trial will last about 30 days, which will give you plenty of time to work through the course.
- Using the SPSS Modeler
- Building a CHAID model
- Adding a second model with C&RT
- Analysis notes
- Using a lift and gains chart
- Exploring algorithms
- Building a tree interactively
- The Bonferonni adjustment
- Handling nominal, ordinal, and continuous variables
- Examining the CHAID tree
- The Gini coefficient
- Weighing purity and balance
- Understanding pruning
- Examining the C&RT tree
- Applying stopping rules
- Using the Auto Classifier tuning trick