Learn how to use the Cloud Vision API to predict categories for photos.
- [Instructor] The first API in the set we're going to look at is the Vision API. And this is designed to work with photographs. So it's going to produce many types of labels and optionally, a probability of that label, which is expressed as the word likelihood or confidence, and that's a number always. So on the right side here, I've pulled up the API explorer, and you can see that there is a rich set of fields or labels that can be included in the response. So you can have cropHints, around if you want to crop the photo, faceAnnotaions, so faces can be discovered, fullTextAnnotations, imageProperties, labels, landmarks, logos, safeSearch, textAnnotations, and webDetection.
For your particular business scenario, you're gonna want to explore this API and figure out which of these annotation types are going to provide the value for your application. Here's an example of calling this API using a Python script and returning the faceAnnotation attribute. So you can see on the left side we have a photo, on the right side we have a bounding box around what the API finds to be a face in the picture. And if we just go through the Python code here, we've got a main method, and we have input and output files and max results, so we would call this and just pass in those values.
And then we would open the input file, and then we would detect a face, so that would be image max results, and then we would print that we found a face. And we would simply call the highlight_faces method on the image and pass in faces in the output file. Now, I'm kinda getting a little ahead of myself. What's great about these APIs is Google has given us test harnesses so we can explore them really right from the browser. You don't even need to actually have an account on GCP or really anything just to see how this works very simply.
So let's do that next. So here we are on the main GCP Vision API page. And you can see that the pages are pretty consistent. We can look at the documentation or go to the GCP console. But if we just want to learn about this, to see if this will fit for our particular use case, we can scroll down, and we have information about the API, and it talks about the categorization and the faces and so on and so forth. But then, a really neat feature is you can try the API, so let's actually do that.
So I'm going to upload a photo, and in real time, here are the results that you can get back about this photo from the Vision API. So you can see there are a number of categories across the top, we're in the Labels category here, and you can see here we've got the labels and the likelihood, so 98 percent sky, 92 percent clouds, so on and so forth. And they're sorted in order. The next category is Web.
And you can see that we have web entities, and we have pages with matched images. Now this is a photo I took, so I would hope that this is not on the web. Next is Document. And what's interesting here, is we have OCR, or text recognition, so it's very hard to see, but this is a traffic sign, and you can see that the text in the traffic sign was partially read.
Next in the Properties are the dominant colors, and the crop hints. Then we have Safe Search, and then the JSON. So this is a great way to get started learning about the information that's returned by default in the Vision API.
- Hosting options: Serverless, containers, and virtual machines
- Enabling the GCP ML AIs
- Preparing data with Cloud Dataflow and Dataprep
- Modeling predictions for images, video, text to speech, and cloud translation
- Machine learning with AutoML
- Advanced machine learning and deep learning
- Machine learning architectures