Join Barton Poulson for an in-depth discussion in this video Decision trees, part of Data Science Foundations: Fundamentals.
- [Voiceover] The first procedure in machine learning that we want to talk about is decision trees. It reminds me of the bumper sticker that says trees are the answer and in machine learning they often are. This is an example of a very simple decision tree. Let's take a quick look at the anatomy of this chart. You haven't called a root node, that's the starting point, it's at the top, then you start having these splits which are indicated on branches or edges is the mathematical term. You have nodes which are decision points.
And then you finish with the leaves or the terminal nodes which are the last category that your data ends up in. Now, decision trees come in two general categories. There are classification trees, which use quantitative and categorical data model categorical outcomes, and there are regression trees, that use the same kind of quantitative and categorical data, except this time to model quantitative outcomes. Sort of like multiple regression but trees are often easier to set up and interpret. When you make a tree one of the important decisions you have to make it about the algorithm or the method you use for making the splits.
There several choices there is historically ID3 the Iterative Dichotomiser 3, C5 which I believe stands for Classifier as development of ID3. CART, which is for classification and regression tree. It's very common in Universal Choice. CHAID, which is for CHi-squared Automatic Interaction Detectior. MARS, which is multivariate adaptive regression splines. And the one that I'm going to be using Conditional Inference trees. Now decision trees in general have some pros and cons.
The pros is that they're flexible they can handle a lot of different kinds of data don't need much preparation. And they're robust to violations of assumptions. They're simple to interpret, they work with large data sets, and you have a white box model where it's very easy to see the process by which a particular case ended up it up and whatever category it is in. The cons are that many of these models rely on heuristics and local optima, they search to optimize a decision point at that node and not necessarily in the global context of the tree.
Some of the methods are prone to overfitting range with too many branches that don't generalize well. Sometimes there are certain concepts like the exclusive or the XOR or parody or multiplexers. They are very hard to model in decision trees but those aren't terribly common. And there can be a bias in the trees toward selecting variables that have more levels than others. Again, one of the reasons I choose conditional inference is because it's less susceptible to this. But let's go through an actual example of this in R.
What I'm going to do here I'm going to use a package called party, and if you don't have it installed, you can install it, I'm also going to use a data set from the libraries that are built into R so I'm gonna do both of those. And we're going to do a conditional inference tree and in this case a classification tree, it's also very easy to do regression trees. We'll use the Iris data that I showed a little bit earlier where we have 150 observations on three species of Iris' and 4 different measurements on a each of those.
So there's not a lot of variables to deal with but it's easy to understand the example. All I have to do is actually use a ctree, conditioning inference tree, and that will create the actual calculations. If I want to see the information about the tree that resulted I can come down here it shows me it made a split on pedal length and on pedal width and on pedal length again. But it's a lot easier to see these results with a plot, so that's what I'll do next. I'll just come here to plot, and I'll zoom in on this.
And I showed a section of this earlier, this is the decision tree for classifying 150 observations of three different species of Iris'. First it looks at pedal length. If they're short pedals then they all go down the left to be setosa. If they're longer and it goes over and looks at pedal width it splits that and then actually looks at pedal length one more time. The versicolor end up in node five, and the virginica are mostly on the far right. There's a node in the middle where it's hard to tell there's four of each and so there's some is misclassification.
In fact if you want to look at it quantitatively the table is a good way to do that. So I'll go back to R and I'll ask it to give me a table of the predicted category and the actual observed category. The actual categories are across the top and the predicted ones are down the side. You see that the setosa were correctly categorized, all of them, the versicolor, one of them was miscategorized as a virginica, and the virginica five of them were misclassified so there is some confusion in this.
But overall the decision tree that we used was able to classify these on two simple measurements. And so, what are our conclusions about decision trees? Three things, first decision trees are flexible and they're easy to do, and those are great. Second you want to match the algorithm that you use to your data, for instance whether you have categorical or quantitative data, and to the specific question that you're trying to answer. And third, watch out for overfitting or use a method that is less susceptible to that.
- The demand for data science
- Roles and careers
- Ethical issues in data science
- Sourcing data
- Exploring data through graphs and statistics
- Programming with R, Python, and SQL
- Data science in math and statistics
- Data science and machine learning
- Communicating with data