You can already detect rectangles, but now you also classify the dominant object currently present in your camera image. Therefore, you need a machine learning model that helps you with the classification. Use the Inception v3 model that allows you to classify dominant objects present in an image from a set of 1000 categories, such as trees, animals, food, vehicles, and people.
- [Instructor] We have already achieved a lot. We can get a live camera feed from the camera. We can use the Vision framework to process that image and detect rectangles, and even draw these rectangles on screen and identify them. So that a user has a visualization of where a rectangle really is. In the last video, we have initialized our image request handler with VNImageRequestHandler and a cvPixelBuffer which is ideal if we are dealing with live video.
But in case you should be using something like a standard image, you can also initialize your image request handler with a ciImage, or even a cgImage. And also pass along the information you need. So this a little bit more complicated approach that we took here with the CMSampleBuffer, and so on. This is only required if you're really processing video data from the camera, or from another video. But now that we have that, and now that we can process a video feed, we can actually do even more and start with object classification.
You will find a model that you can also download from the Apple Machine Learning website which is called Inception. And I'm going to drag and drop that model now into our project and hit the finish button after I've made sure the destination's copy items if needed and add to targets is selected. Hit finish, and now we're adding this to our project. And depending on the size, this always takes a while. And since this is always 100 megabytes large, this took a few seconds.
And as you can see here, as we've seen with the Core ML model that we used in an earlier section, we again get a lot of information about the machine learning model. We get the model class information, and we see that our model is not yet part of our target. So again, we are opening up our file inspector. And make sure that the target membership for our Inception3 machine learning model is activated so that Xcode creates a class for us.
And we can also see that we have a lot of information here about the model evaluation parameters. And it says here that the image should for example have a size of 299 by 299 pixels. And we are getting a class label of Probs for the probabilities. And we get a class label with the most likely image category. And this would be important if we just were to use Core ML. But since Core ML and Vision are really greatly working together.
We can do something else, and we do not have to worry about these parameters right now. So we are opening up our ViewController. And if you remember, we have our setupVision function right here. And here the first thing that we did was defining a rectangle request. So I'm going to call that Rectangle Request here. And right below, I'm adding another comment which I'm going to call Object Classification. And we're adding some more code right here.
And what we can do first of all is getting our machine learning model. So I'm using a guard let statement here because this could fail. And I'm using the name visionModel right here, and that I am trying to use VNCoreMLModel. Which is a dedicated function from Vision to load a machine learning model. And this is going to be really cool. For a specific model, so I'm using my Inception version three. Initializing it, and using the model here.
And if this does not work, then we are throwing a fatal error with a message let's just say "can't load Vision ML model" alright. And now we have our machine learning model ready, and we can work with it. And we can again create a classification request. This is the same thing as we did here for the rectangle request. And here, this is not going to be a rectangle request, but a classification request. And I'm now using VNCoreMLRequest.
So we can use our Core ML model to perform a Vision request. And now it asks us to state a model that we'd like to use. And again, a completion handler. So the model that we want to use is our Vision model that we've just defined one line above. And the request handler I'm just going to call that handleClassification. And with that, we have created our classification request. Now we can make some adjustments as we did with the rectangle detection request.
So I'm using my classificationRequest, and access its properties. But let me just quickly build that to see if there is a problem. And indeed, as we had with our rectangle detection request, we get the error because we did not yet define this handleClassification function. And here I'm just copying now this handleClassification function's name. And right below our drawVisionRequestResults, I am going to add my new function handle classifications.
And this function as well is going to get two parameters. As we've learned, the first one is request. Which is a VNRequest object. And the second one is an error. So an optional, error, object. And then we're going to implement that later. But for now, just make sure that our error in the set of Vision function here in line 46 disappeared, and we can continue making some adjustments so I'm using my classification request, and I am now adjusting the image crop and scale options.
And I'm just using centerCrop here. And with that, I have actually prepared my classification request. I have adjusted that the image that we want to load into this classification request should be cropped around the center. And now, all I need to do is to put this request into my pipeline of request after our rectangle detection request. And I'm just appending my classification request right here.
And then Vision automatically performs a rectangle detection and a classification request. And next, we are going to implement the handleClassification request function so that we can definitely see what our camera sees and get the classified strings.
- What are machine learning, Core ML, Vision, and NLP?
- Adding a machine learning model to a project
- Getting predictions from machine learning models
- Converting existing machine learning models for Core ML
- Classifying images and detecting objects with Vision and Core ML
- Analyzing natural language text with NSLinguisticTagger