Get a basic introduction to the Vision part of Microsoft Cognitive Services. This video talks of the various capabilities that are discussed in this course.
- [Instructor] Let's start talking about the basics of Cognitive Services: Vision. The Vision API is actually a combination of many APIs, and there are a number of things that it is able to do. For instance, when I show you this picture, what do you see in it? As a human, you may say, well, I see a dog. I see a happy dog. I see a dog with his tongue hanging out. Maybe the dog is poking his head out of a car's window. All of these descriptions are accurate, and the Vision API is able to see a picture, and it is able to give you a description of what it sees.
Along with this description, it is also able to give you a confidence rating. For instance, I see a dog looking out of a window. Is it really a window? Perhaps the confidence of that would be a little bit lower, maybe. It is also able to tag an image for you. Imagine that you work in a company and you have a big asset library, maybe pictures or even videos. Wouldn't it be nice to be able to tag them so these pictures are easily discoverable? Why just a company? Even as an individual, you've probably taken a lot of pictures over the years, and you want to be able to tag them so you can find those pictures easily afterwards.
Vision API can help you do that. But tags are a little bit loose form. It's just a array of strings that you get back in return. Sometimes you want to categorize an image, categorize an image in a more rigid or firm categorization taxonomy. So the Vision API follows a 86-category taxonomy under which it can categorize your image. We'll see all this in practice shortly. When you submit a picture to the Vision API, it is able to differentiate clip art versus images versus line art, and it can even tell you the quality of the clip art it sees.
For instance, the picture in the middle that you see over here is actually a pretty good quality clip art. Sometimes the boundaries may be a little fuzzy. A picture may somewhat look like line art or clip art. The Vision API can help you distinguish between all of these. Have you ever seen something and you wondered what exactly you're looking at? Imagine you had a phone app with which you could take a picture and will just tell you what you're looking at. For instance, this is a picture of the Taj Mahal. The picture on the left is of Isaac Newton.
These are different models, prebuilt models available in the Vision API that you can simply submit a picture to and the Vision API will tell you what it sees. Yes, you can just submit a picture of Isaac Newton and the Vision API will recognize a celebrity. It'll tell you, hey, this is Isaac Newton. You can even build custom models and export them for offline use, say on a mobile device. The Vision API can also recognize handwriting.
Now this handwriting is very clean. It's better than my handwriting, so as we go through the exercises, I will actually hand write a note and we'll try and recognize and see how well it does. But yes, it can recognize handwriting, as well. But frequently, you run into printed text. A good example is something written on a whiteboard versus something written on a receipt. Whiteboard would be handwritten; a receipt would be printed. Vision API is also able to extract meaningful information via OCR.
Now in this course, I'm talking only about the Vision API, which can do things like OCR, but imagine if you were to pair it with other APIs that let you do, for instance, translation. You could have a mobile app that would be incredibly useful. Imagine that you're in a foreign country and you see a road sign in a language that you don't understand. Wouldn't it be nice to just pull up your phone, show that particular sign to your phone's camera, and the phone app will tell you exactly what you're looking at? I would find that very useful.
The Vision API can also generate thumbnails. Now the picture on the right is a very small thumbnail of the picture on the left. It looks a little pixelated because I've increased it in size to make it look a little bit clearer, but if you pay attention, it's actually intelligent in the way it created the thumbnail. It didn't just take any arbitrary portion of the picture. It made sure to include the most relevant portions of the picture.
So yes, the thumbnail generation is intelligent. It is able to recognize important parts of the picture and is able to give you a meaningful thumbnail. The Vision API can do that for you. And there are many other things that the Vision API can help you do. For instance, it can detect emotions. You can show it a picture and it'll tell you, hey, this person looks angry or surprised or depressed. It can detect adult or racy pictures or videos.
This will be great in content moderation. You can create a people library and identify people in the library. Imagine how useful that would be for profiling customers as they walk into a store and look at a sign soon as they walk in. You could put a camera right next to that sign. Or imagine how useful law enforcement would find these pictures. You can help moderate content and you can work with both images and videos.
We'll see all of these and many more such possibilities throughout this course.
- Setting up the project
- Writing code to describe an image
- Returning multiple descriptions
- Writing an analyze method
- Applying tags and categories
- Analyzing colors
- Identifying art and adult content
- Recognizing celebrities and landmarks
- Handwriting and OCR
- Detecting faces and emotions
- Identifying people