- In this lecture, we're going to talk about computer vision. And Eric Nyberg's going to help. Computer vision refers to the ability of computers to identify objects, scenes, and activities in images. An example many of us are familiar with is tagging photos in Facebook, which uses face recognition, an application of computer vision. Computer vision is hard, because we don't really know how we see, so we don't know how to program a computer to do it, either. Also, objects look different under different circumstances. They may be partially covered, illuminated differently, appear in front of cluttered backgrounds, or viewed at different angles, viewpoints, and scale.
I'm going to ask Eric to give an overview of how computer vision works. - Well, David, computer vision has to think about how to break down each image into its component parts. And of course, we start with an image that's basically pixels, or little dots of color. And this involves the same kind of layered processing that we've talked about in other areas of the course, where you start with the low level features, like the pixels and the colors, and then work your way up to a deeper analysis, with higher level features, like lines and areas and even objects.
- [David] Can you talk about the role of machine learning in computer vision? - Yes, machine learning can be used to train object recognizers by using a lot of examples of labeled data, labeled images. The reason why the Facebook face tagging works so well is that it has a lot of examples of each person that you've already tagged that it can use as training data to figure out who's in this new picture that you've just uploaded to your Facebook. - Got it, so computer vision is getting pretty good, especially by applying machine learning, with very large sets of images, like the image net database, which has helped to improve performance dramatically in recent years.
An example is the progress on a standard computer vision benchmark, which experienced a fourfold improvement in image classification between 2010 and 2014. Facebook, meanwhile, reports that it's able to recognize faces with 97% accuracy. Eric, what are some of the new frontiers in performance in computer vision? - Well, David, I think that computer vision is really a fundamental element of multimedia processing. And I think that today, we're doing research and development on new applications, that will go beyond just recognizing two dimensional still images, and to actually recognize what's going on in a video, maybe even being able to recognize an event, which takes place over several different two dimensional images that appear in a video.
- Why is understanding events or understanding video so hard? - Well, I think with understanding video and understanding events, you have to start connecting the dots between multiple images. So if we're looking at an object together, and then that object begins to move, from one frame to the next, we have to be able to recognize that it's the same object. To top it off, things get even more challenging when the video that you're analyzing might not have a steady point of view or steady lighting.
If I record a video with my cell phone, while I'm driving down the Los Angeles freeway, it's going to be a big challenge for computer vision software to understand exactly what's going on. - Got it, so event detection in video is still a work in progress. - Yes, that's still a big challenge for us. - As you might expect, there are numerous applications of computer vision, including handwriting recognition, medical imaging, meaning helping doctors identify potential tumors, autonomous driving, robotics surveillance, and more recently, gesture detection, for advanced user interfaces.
What other interesting applications of computer vision are you familiar with, Eric? - So we have an interesting project that we're working on now, David, which involves using cameras and sensors at the local airport, to be able to detect which parking spots are full, and which are empty, so that when drivers arrive at the airport, we can actually guide them to the area of the parking lot where there are most available parking spots for them to park in. - Great, so let's wrap up then. Computer vision aims to identify objects, scenes, and activities in images.
It works in stages hierarchically, by identifying features, and then pieces of objects, and then entire objects, in a series of applications connected in a pipeline, much the way we described how speech recognition works, and natural language processing works. Applications are broad, including handwriting recognition, autonomous driving, surveillance, and gesture detection, used in advanced user interfaces.
- Artificial intelligence explained
- Cognitive technologies explained
- Supervised, unsupervised, and reinforcement learning
- Machine learning models and algorithms
- Language, speech, and visual processing
- Business applications of cognitive tech
- The impact of cognitive technologies at work
- Future of cognitive technologies