From the course: Building Recommender Systems with Machine Learning and AI

History of artificial neural networks

- [Instructor] Let's dive into artificial neural networks and how they work at a high level. Later on, we'll actually get our hands dirty and actually create some. But first, we need to understand how they work and where they came from. It's pretty amazing stuff. This whole field of artificial intelligence is based on an understanding of how our own brains work. Over millions of years of evolution, nature has come up with a way to make us think. And if we just reverse-engineer the way that our brains work, we can gain some insights on how to make machines that think. Within your brain, specifically within your cerebral cortex, which is where all of your thinking happens, you have a bunch of neurons. These are individual nerve cells and they are connected to each other via axons and dendrites. You can think of these as connections, wires, if you will, that connect different axons together. Now, an individual neuron will fire or send a signal to all of the neurons that it's connected to when enough of its input signals are activated. At the individual neuron level, it's a very simple mechanism. You just have this neuron that has a bunch of input signals coming into it and if enough of those input signals reach at certain threshold, it will, in turn, fire off a set of signals to the neurons that it, in turn, is connected to as well. But when you start to have many, many, many of these neurons connected together in many, many different ways with different strengths between each connection, things get very complicated. This is a perfect example of emergent behavior. You have a very simple concept, a very simple model, but when you stack enough of them together, you can create very complex behavior that can yield learning behavior. This actually works. Not only does it work in your brain, it works in our computers as well. Now, think about the scale of your brain. You have billions of neurons, each of them with thousands of connections, that's what it takes to actually create a human mind. And this is a scale that we can still only dream about in the field of deep learning and artificial intelligence, but it's the same basic concept. You just have a bunch of neurons with a bunch of connections that individually behave very simply, but once you get enough of them together wired in enough complex ways, you can actually create very complex thoughts and even consciousness. The plasticity of your brain is basically tuning where those connections go to and how strong each one is, and that's where all the magic happens. Furthermore, if we look deeper into the biology of your brain, you can see that within your cortex, neurons seem to be arranged into stacks or cortical columns that process information in parallel. So, for example, in your visual cortex, different areas of what you might see might be getting processed in parallel by different columns or cortical columns of neurons. Each one of these columns is, in turn, made up of many columns of around 100 neurons per mini-column. Mini-columns are then organized into larger hyper-columns and within your cortex, there are about 100 million of these mini-columns. So, again, they just add up very quickly. Coincidentally, this is a similar architecture to how the 3D video card in your computer works. It has a bunch of very simple, very small processing units that are responsible for computing how little groups of pixels on your screen are computed. It just so happens that that's a very useful architecture for mimicking how your brain works. So, it's sort of a happy accident that the research behind your favorite video games lent itself to the same technology that made artificial intelligence possible on a grand scale and at low cost. The same video cards you're using to play your video games can also be used to perform deep learning and create artificial neural networks. Think about how much better it would be if we actually made chips that were purpose-built specifically for simulating artificial neural networks. Well, it turns out some people are designing chips like that right now. By the time you watch this, they might even be a reality. I think Google is working on one as we speak. So, at one point, someone said, "Hey, the way we think neurons work is pretty simple. It actually wouldn't be too hard to actually replicate that ourselves and maybe try to build our own brain." This idea goes all the way back to 1943. People proposed a very simple architecture where if you have an artificial neuron, maybe you can set up an architecture where that neuron fires only if more than a certain number of its input connections are active. When they thought about this more deeply in a computer science context, people realized you can actually create logical expressions or Boolean expressions by doing this. Depending on the number of connections coming from each input neuron and whether each connection activates or suppresses a neuron, you can actually implement logical expressions in artificial or natural neurons. This particular diagram is implementing an OR Operation. So, imagine that our threshold for our neuron was that if you have two or more inputs active, you will, in turn, fire off a signal. In this set up here, we have two connections to neuron A and two connections coming in from neuron B. If either of those neurons produce an input signal, that will actually cause neuron C to fire. So, you can see, we have created an OR relationship here, where if either neuron A or neuron B feeds neuron C to input signals, that will cause neuron C to fire and produce a true output. We've implemented here the Boolean operation C equals A or B just using the same wiring that happens within your own brain. It's also possible to implement AND and NOT via similar means. Then, we start to build upon this idea. We created something called the linear threshold unit, or LTU for short, in 1957. This just built on things by assigning weights to those inputs. So, instead of just simple on-and-off switches, we now have the concept of having weights on each of those inputs as well. This is working more toward our understanding of the biology. Different connections between different neurons may have different strengths, and we can model those strengths in terms of these weights on each input coming into our artificial neuron. We're also going to have the output be given by a step function. So, this is similar into spirit and to how we were using it before. But instead of saying, "We're going to fire if a certain number of inputs are active." Well, there's no concept anymore of active or not active, there's weights coming in, and those weights could be positive or negative. "If the sum of those weights is greater than zero, we'll go ahead and fire off a signal. If it's less than zero, we won't do anything." It's just a slight adaptation to the concept of an artificial neuron, where we're introducing weights instead of just simple binary on-and-off switches. Let's build upon that even further and create something called the perceptron. A perceptron is just a layer of multiple linear threshold units. Now, we're starting to get into things that can actually learn. By reinforcing weights between these LTUs that produce the behavior we want, we can create a system that learns over time how to produce the desired output. This is also working more toward our growing understanding of how the brain works. Within the field of neuroscience, there's a saying that, "Cells that fire together, wire together," and that speaking to the learning mechanism going on in our artificial perceptron where we have weights that are leading to the desired result that we want. We can think of those weights as strengths of connections between neurons. We can reinforce those weights over time and reward the connections that produce the behavior that we want. You can see here, we have our inputs coming into weights, just like we did in LTUs before, but now, we have multiple LTUs grouped together in a layer. Each one of those inputs get wired to an individual neuron in that layer. We then apply a step function to each one, which will produce a final set of outputs that can be used to classify something, like what kind of image this perceptron is looking at. Another thing we introduce here is something called the bias neuron off there on the right. It's something to make the mathematics work out. Sometimes, we need to add in a little fixed constant value to make the neurons fire at the right values. And this bias amount can also be learned as the perceptron is trained. So, this is a perceptron. We've taken our artificial neural network, move that to a linear threshold unit, and now, we've put multiple linear threshold units together in a layer to create a perceptron. Now, we have a system that can actually learn as you optimize all of the weights between the neurons in each layer. And you can see there's a lot of those weights at this point, which can capture fairly complex information. If you have every one of those inputs going to every single LTU in your layer, they add up fast, and that's where the complexity of deep learning comes from. Let's take that one step further and we'll have a multi-layer perceptron. So now, instead of a single-layer perceptron of LTUs, we're going to have more than one. We actually have now a hidden layer in the middle there. You can see that our inputs are going into a layer at the bottom, the outputs are layered at the top, and in between, we have this hidden layer of additional LTUs, linear threshold units, that can perform what we call deep learning. So, here, we already have what we would call today a deep neural network. Now, there are challenges of training these things because they are more complex, but we'll talk about that later on. The thing to really appreciate here is just how many connections there are. So, even though we only have a handful of artificial neurons, you can see there's a lot of connections between them and there's a lot of opportunity for optimizing the weights between each connection. So, that's how a multi-layer perceptron works. You can just see that, again, we have emergent behavior here. An individual linear threshold unit is a pretty simple concept, but when you put them together in multiple layers all wired together, you can get very complex behavior because there's a lot of different possibilities for all the weights between all those different connections. Finally, we'll talk about a modern deep neural network. Really, this is all there is to it. The rest of the section, we'll just be talking about ways of implementing something like this. So, all we've done here is replace that step function with something better. We'll talk about alternative activation functions. This one is illustrating something called ReLU, which we'll examine more deeply very soon. The key point is that a step function has a lot of nasty mathematical properties, especially when you're trying to figure out their slopes and their derivatives. So, it turns out that other functions work out better and allow you to converge more quickly when you're trying to train a neural network. We'll also apply softmax to the output, which we talked about in the previous lecture. That's just a way of converting the final outputs of our neural network or a deep neural network into probabilities, from which we can just choose the classification with the highest probability. And we will also train this neural network using gradient descent or some variation thereof. Maybe that will use autodiff, which we also talked about earlier, to actually make that training more efficient. So, that's pretty much it. In the past five minutes or so that we've been talking, I've given you the entire history of deep neural networks and deep learning. It's not that complicated. That's really the beauty of it. It's emergent behavior. You have these very simple building blocks, but when you put these building blocks together in interesting ways, very complex and, sometimes, mysterious things can happen. Let's dive into more details on how it actually works up next.

Contents