Join Keith McCormick for an in-depth discussion in this video Neural nets, part of Machine Learning and AI Foundations: Classification Modeling.
- [Instructor] Okay, let's talk about neural networks. You almost certainly have heard of them, but the concept behind them might be new and we'll also be talking about the implications for our particular topic, binary classification. First I wanna mention that deep learning, which is a really hot topic right now, is a special kind of neural network, and within the next couple of minutes, I'll be able to explain what's special about deep learning. I do wanna mention, however, is that deep learning is really quite different from what we're gonna be discussing.
We're gonna be discussing multilayer perceptrons and the applications of deep learning, at least at the moment, are really in the areas of visual recognition, speech recognition, frequently on very large datasets, so large that they actually have to be cognizant of what kind of computer technology that they run on. So, once we see the neural net diagram, I'll be able to amplify a bit the differences between multilayer perceptrons as we normally talk about them over many years and deep learning.
Neural networks are famously a black box technique. They're not gonna tell you much of a narrative about your variables in terms of what the most important variables are or what causes the risk to go up or what causes the risk to go down and so on. I discuss why neural net is black box and how to try to interpret the coefficients in considerable depth in a book titled SPSS Statistics for Data Analysis and Visualization. I really get into the details.
Some of you may find it interesting if you wanna go to another layer of abstraction beyond what we're talking about here. In neural networks, all inputs are used and neural networks have the reputation of benefiting from screening. If you have a large pool of variables and some of the variables aren't too good and perhaps the variables are redundant with each other, neural networks have the reputation of being a bit more sensitive to this than other techniques.
One key feature that we're gonna have to understand is backward propagation, so in this video, we will be talking about what that is. It truly is, kind of, at the heart of neural networks. Okay, so in doing this example, I've used the same four inputs, age, passenger class, embarked, and sex, and here's a diagram that my software package of choice generated that I wanna talk about. So, I wanna break this down into its components. We've got out input variables, sex, passenger class, age, and embark showing in the diagram.
Bias refers to, essentially, the Y-intercept. It's a little bit different here, but it's very much like the concept or the Y-intercept in traditional linear regression. Then finally, what's been labeled here as Neuron1, Neuron2, and Neuron3 is the somewhat famous hidden layer. What makes deep learning networks different is those are large, complicated neural networks with multiple hidden layers, and large, complex hidden layers at that.
Meaning that thousands of these lines, tens or even hundred thousand of these lines are in these networks, so they require very advanced computers and huge datasets. Here we have just one hidden layer with three nodes or neurons, and of course, our target variable. What I wanna do now is use a metaphor that my colleague wrote about in his book Applied Predictive Analytics. I recommend this book on a number of levels, but I'd like to share this quote with you.
What Dean is talking about is catching fly balls as a boy and how that's like backward propagation. Let's talk about the quote and then I'm gonna elaborate a bit. He writes, "The learning process is similar "to how I learned to catch fly balls as a boy. "First, imagine my father hitting a ball "to me in the outfield. "In the beginning, I had absolutely no idea "where the ball was going to land." Then I'm gonna jump down to the bottom there. "But then something began to happen. "The more fly balls my father hit, "the more I was able to associate the speed "the ball was hit, the steepness of the hit, "and the left/right angle." So, what he's doing is he's shrinking the gap, shrinking the gap, those errors are getting smaller and smaller.
That's really what backward propagation is all about. Let me return to the diagram. You see, what's happening is he's predicting a scale value. We're not, we're predicting survived, but he was trying to figure out where the ball was going to land and he was off by a certain number of feet. What he did each time is close the gap. He didn't run to where the ball fell, but rather, he ran a part of the way between where he was standing and where the ball fell.
What backward propagation is is you take that adjustment and you propagate it through all the lines that you see here so that each of the lines gets a little bit of that share so that the next time it makes a prediction, it does a more accurate one. Now, I'm only gonna talk about this diagram briefly because it's a complicated mess, as you can see. This is really closer to what it really looks like. Notice that categorical variables have to be dummy coded.
I can't exaggerate on the following point. I meet folks numerous times each year that have very large, complicated categorical variables. Things like make and model of vehicles predicting something about the vehicles, and they don't know that inside the neural network or inside these other algorithms, all this complicated dummy coding is going on. So, they think that embarked is one variable and it's really four. Embark C, Q, S, and even blank.
Or they think that passenger class is one variable and it's really three. First class, second class, and third class. You see where I'm going with this. So, why all of these lines and what purpose do they serve? Well, I discuss the notion of interactions in neural network in one of the videos in my regression course, and the reason that that video is a bit different than this one is, here we're predicting a categorical variable and there we're predicting a scale variable.
If you're new to interaction terms, I believe it often makes more sense when you learn about interaction terms with a scale variable first. But we saw it when we were looking at trees and what we saw is that impact of passenger class on risk or on survival depended upon a third variable, whether or not the passenger was male or female. The hidden layer and all of those lines connecting the input variables to the hidden layer and then the hidden layer to the target category, survived and died, are two terribly important characteristics of our neural network.
One, it's how the neural network deals with interactions in curvilinearity, and two, it is what makes the neural net opaque because there's so many lines zigzagging through representing the same variables going in different directions that it's almost impossible to figure out what those coefficients mean.
Note: These tutorials are focused on the theory and practical application of binary classification algorithms. No software is required to follow along with the course.
- Why do you need classification?
- Statistical algorithms versus machine learning algorithms
- Combining models using ensembles
- Classification modeling challenges