Fixed feature extractors are possible with pre-trained models. In this video, learn why this is possible.
- [Instructor] Now this is our regular VGG-16 Model that we're familiar with. Now normally what you will so is input an image to the network, the image moves forward through the network, and then we obtain our final classification probabilities at the end of the network. But there's nothing to say that we have to have the image propagate through the entire network. So what I can do instead, is to stop propagation at some pre-specified layer that we decide, say before the classifier. So we extract the values from the specified layer and then treat the values as a feature vector. Now we normally stop this just before the fully connected layers. Now, if we wanted to do this for the VGG-16 network, then the last section before this would be the MaxPool2D. Another option with the fixed feature extractors is to take a network pre-trained on ImageNet, remove the last fully connected layer and then treat the rest of the network as a fixed feature extractor for the new data set. We'll need to retrain the last fully connected layer. Now you'll get similar results with either extracting features before the classifier or removing the last fully connected layer. In this course, we'll remove the last fully connected layer and re-train this in our notebook. So let's head over to Google Colab to run our notebook. Now the reason we can have these feature extractors is because this networks has already been trained on the ImageNet data set. So the parameters at the different layers have been modified to reflect this. Now this is our regular VGG-16 Model that we are familiar with. So let's grab the fixed feature extractor notebook from our exercise file. Now as you can see there are a couple of sections in the notebook and we look at each of them in turn. Now when we to training on our network, as part of transfer learning, we need to access to GPUs. Now, fortunately for us, Google Colab gives us access to a GPU for free. So just select edit, and the notebook settings, and you can change that from none to GPU, so we want a GPU, or the other option is to go to Runtime, change Runtime type and select GPU. So we save that and we're good to go. So let's look at the first section. So we're familiar with the first few cells of code, and we've downloaded the CIFAR-10 dataset. Now we're going to be using the VGG-16 Model. So model equals models dot VGG-16 and you want to pre-train equals true. So this is just going to be downloading the VGG-16 Model, great. And we've seen the architecture of the VGG-16 before. So we've got the features, so that's model dot features, and we can take a look at the different features that we have in the VGG-16 Model. And that block is our classifier. Now the first thing we want to do when treating our network as a fixed feature extractor, is to freeze the network. So all I need to do is to grab all of the parameters and set the requires grad to false. For param in model dot parameters param dot requires grad equals false. Now we're going to remove the last fully connected layer, and then treat the rest of the network as a fixed feature extractor. We then add a linear classifier, like LogSoftmax that we will train on. So if I want to get to the last layer, I just need to type model dot classifier and provide as an index minus one, and say equal nn dot sequential, and then the new linear layer, so that's nn dot linear with in features equals to 4096. Now I want to remove the last fully connected layer and the number of in features for the current layer is 4096. So I use the same number of in features and because I have 10 classes in the CIFAR-10 dataset, I use out features equals 10. And I want to follow that by an nn dot LogSoftmax. So nn dot LogSoftmax. Over dimension equals one, and I run that. So if I take a look at my classifier again so model dot classifier, so you can see that the number of out features has changed and been modified from 1000 to 10. Now the next thing we'll so is to use NLL loss as our criterion or loss function. So criterion equals nn dot NLL loss and run that cell. Now before we move on to train the network, let's take a deeper look at what it means to freeze layers what cross entropy loss is all about and so on. But now that we're done with this section, I'm going to close it off.
- What is transfer learning?
- Using autograd
- Creating a fixed feature extractor
- Training an extractor
- Fine-tuning the ConvNet
- Learning rates and differential learning rates