Deep learning is particularly effective with images and video data. In this video, get an overview of how deep learning works.
- [Instructor] Before we get into what deep learning is, let's talk a little bit about AI and machine learning, as these terms often get used interchangeably. AI is the ability of machines to perform tasks normally requiring human intelligence. This includes things like visual perception, decision-making, speech recognition, and translating between different languages. Now, AI, as a field, includes machine learning and deep learning, but also includes several approaches that doesn't involve any learning at all. For example, when AI first kicked off in the 1950s, many experts believed that human intelligence could be achieved by creating a sufficiently large rule base and then using this to manipulate any input data. This was known as expert systems. Machine learning is a relatively new field of artificial intelligence requiring learning. Typically, when a programmer has to solve a problem, they take as input the data, and then, based on a set of rules that they create, they arrive at the answer. With machine learning, we turn this on its head. Given the data and the expected results, we get a machine to determine what the rules should be. Machine learning is different from regular programming in that the system is trained rather than explicitly programmed. Deep learning is a popular subset of machine learning where the focus is on learning rules via several successive layers. Modern deep learning networks typically have tens if not hundreds of layers. Let's use this diagram as a representation of a deep learning network. You have the input layer, which has an image with the number four on it, and the final output prediction showing that the number four has been detected. The layers in between are known as hidden layers, and each layer has weights and biases that can be modified. A deep learning network is one where you have more than one hidden layer. What's actually happening in these layers? Each layer is learning certain characteristics of the image. This could be the edges or corners or textures. Now, when you feed the deep learning network several different images of a handwritten number four and tell it that these are all fours, the network gets good at determining when you've written a hand number four by learning this. When you first pass an image of a number four, the weights of the different layers are random. This means the final prediction from the deep learning network is unlikely to be a number four. There needs to be some way for the network to compare what the predicted value is and what the actual values are. This is done using a loss function, and here, in order to get the predicted values closer to the actual values, we need to reduce the loss function. To do this, we need some sort of feedback mechanism, so we compare the predicted output with the actual one, and then, modify the weights of each of the layers starting from the final layer and working our way back. This is the job of the optimizer using an algorithm called back propagation. In this course, we'll be using pre-trained networks. These are deep learning models that have been trained on a certain data set, so they're really good at things like determining numbers from handwritten digits, or image classification, or translating between languages. There's been a lot of hype around deep learning, and rightly so. This is because we've had breakthroughs in areas that have been historically very challenging. Deep learning models are used in self-driving cars, image classification, and handwriting transcription. They have near-human level speech recognition and can accurately translate between languages. Now that we've had a quick overview of deep learning, let's take a look at OpenCV, which is probably the best open source computer vision software out there.
- Deep learning for OpenCV
- Viewing images and video in OpenCV
- Working with blobs in the dnn module
- Image classification
- Video classification