Join Scott Peterson for an in-depth discussion in this video Integrating speech recognition, part of Developing UWP Apps: 10 Voice, Speech, and Cortana.
- [Narrator] Okay, well we're here in Visual Studio 2017 in the tagur universal Windows platform app we've been working on throughout this series. Just as a reminder, let's go ahead and load up the tagur app and take a look at where we're at so far in this series. And just as a reminder, we have a fully featured app now at this point and I've taken the liberty of adding a couple of enhancements just to give our app a little bit more flair. So you may notice now when you resize, we've got this great resize animation.
We've also got the ability to click on a caption and have it flip and show us the tags that are associated with each image and we've just toggled that simply there. It doesn't have a lot to do with this session but I decided to add that just to demonstrate the use of the new composition name spaces in universal Windows platform. So that being said we have a fully functioning app here and we're getting ready to integrate speech recognition and my thought was that we don't have a real easy way just to continue to add new images here.
We've added this add button in one of our previous sessions but I thought it'd be nice if we could just add speech recognition so that a user could simply say, add images and browse to add additional images to have them sent out to cognitive services in caption and tagged. So let's do that first. Let's just add an app or button down here in our app and throw a little speech recognition into it using the speech recognizer API. So we'll go ahead and go to pages, main page.
And right now we just have this add and refresh button that aren't really doing anything. So let's add that button here and essentially we're just creating a new at bar button, and we're just labeling it as speech and we're getting a little icon. And then we're going to have this start speech clicked event on the back. So let's just go to the code. Here's where all the meat will happen. We'll just bring this code in and take a look at how we'll pull up that speech recognizer. So just going to new up the speech recognizer and I'm going to set some UI options here.
And the first one I'm going to set it read-back enabled. So we'll have this speech recognizer use our default voice to read back what was recognized. We're going to give it some example text. We're going to say for example add an image and show confirmation will essentially give us the ability to say yes or no, and override anything in the user interface to show confirmation. And then we're going to compile constraints. But we're just going to leave that in, so this is just something that has got to happen. This is mandatory. We can add other constraints, we can add additional constraints here just by saying speech recognizer constraints.add and we can add constraints there that we'll do in a minute.
But right now we're just going to say basically allow any content to come through and we're going to show that UI. So let's go ahead and put a bright point there and see what might happen if we load this up now in our app. There we go, now we have this new button that says speech. We'll go ahead and click on that to bring up the speech recognition UI. And I want you to notice that we immediately got an error that says access is denied.
This is a source of confusion sometimes for developers and this is because we're trying to recognize speech and our app does not currently have permission to access any microphones. So if we go down to the package manifest and go to the capabilities tab, to get around that we simply just add the capability for microphone. Now let's go ahead and run that.
Now if we bring up the speech recognizer dialogue. Add an image. - [Prompt] Heard you say add an image. - [Narrator] So you notice that because we had is read back enabled, the UI dialogue read back to us, I heard you say add an image. And now if we look at the speech recognition result, we can look at the text and it says, add an image. And of course without any constraints, we can do anything here. So if we go ahead and pull that again we could say, the quick brown fox jumped over the lazy fence.
- [Prompt] Heard you say the quick brown fox jumped over the lazy fence. - [Narrator] And of course we read that without any issues. So that's pretty cool, that's not a problem. We can make some changes here. We could say, is read back enabled false. Pull that up. How now brown cow.
And noticed that we grab that through but we did not have the confirmation read back to us. So that's pretty cool, but maybe we don't want to have that user interface element on there. Maybe we just want to recognize without that dialogue. So we can simply just change this to recognize async, as opposed to recognize with UI async, and essentially have the same experience. So let's pull that up.
I love speech in the universal Windows platform. So you notice that we didn't see a dialogue in anyway and now we have I love speech in the universal Windows platform. Okay, so that's pretty cool and we can do that so easily but that doesn't really add any functionality to our app at all. So let's just update this click event real quick. Let's just add a new one in here and take a look at what we're going to do here. So now we're going to spin up that recognizer again, but now I'm going to add some constraints.
I'm changing the example text to be for example add images cancel, this is just the label that's going to be shown to the user. I don't have to do this, in fact we don't have to set these UI options at all. And then I'm going to give a list of responses. I'm going to say, I'm only going to listen for add images or cancel. And then I'm going to create a new speech recognition list constraint. I'm just going to give it a name and it's going to have those responses. Then I'm going to compile constraints. So now when I going to compile constraints it's going to compile everything that I've added as a constraint.
So it's saying, hey you I'm only going to listen for add images or cancel, and I'm not going to pay attention to anything else. So let's just go ahead and comment this out here. We'll load that up. Now we'll just pull that up in the UI and I'll just say something something something. Notice that it doesn't recognize that. Saying something something something. Okay, so let me try again. - [Prompt] Sorry didn't catch that.
- [Narrator] So now we can say, add images. - [Prompt] I heard you say add images. - [Narrator] And now that we've said add images it's going to drop through and tell us that our status is a success without any problems. So now we're going to be able to say, it's just going to ignore anything other than add images it's just going to give us a success equals false, or it's just not going to recognize, the content becomes true. And of course this is important 'cause this can happen in any language. We're just using English for now, and that's fine. So remember the goals that we actually want to add some cool functionality here.
So let me just uncomment this code, and we'll take a look at what we're doing. So now I'm saying, I'm only going to listen for add images or cancel. And if the recognition result is successful and it doesn't equal cancel, which means that they said add images, I'm going to browse for more images, and I've just got this function that's been added onto my view model to browse for more images, this give us the ability simply with our voice to add more images to tagur. So let's go ahead and pull that up and see what that might look like.
Add images. - [Prompt] Heard you say add images. - [Narrator] And so there we go. Now it's pulled up the file picker dialogue for us. Let's just go ahead and we'll just pick that fruit there those apples, add that in. And there we've got those apples that have now been added to our tagur application. We might as well just go ahead and tag that. I'm just going save that as a favorite as well. Okay great. So that's real easy to do, we just do this all day long.
We can just say cancel. - [Prompt] Hear you say cancel. - [Narrator] And then of course if we want to get rid of that read back enabled we can just say false. Add images. That's great. And then of course, we can also very easily just say recognize async and not even have the user interface enabled at all, pull that up.
Add images. Okay cool, so you see how powerful that is there. We might as well just keep that functionality in there, I think that's pretty handy, but I thought it might be helpful as well to add some speech recognition functionality on each one of these images. So we see that by default, the Microsoft cognitive services is captioning and tagging are images with something but these captions aren't always totally accurate. So for example, this image says a close-up of a person in a red light when it's obviously a close-up of a person with binoculars.
So this is a person with binoculars, I thought it might be nice to have a little speech recognition element in here so that we can override the caption that has been saved by cognitive services. So why don't we go in and add another button to our image control that we created in the advanced and custom controls session. So we'll scroll down there. Stick a button in here.
And in general, we've got the same process that's happening I'm going to go ahead and change that column to one so that those line up well. So now I'm going to be calling this command get updated caption command. But I haven't implemented that yet. So let's implement that command and take a look at what we're going to do there. First thing we need to do is go to our commands core commands, we'll open that up. Right above our save to One Drive we put it a couple of sessions ago.
Let's add this new command and take a look at what's happening there. So now, we're going to start this capture voice async. We're going to just spin up a recognizer. And I'm going to use this as an example text, a man riding a skateboard up a tree. It's going to compile the fault constraints, listen for the speech recognition result and if anything has come through, I'm basically just going to have data caption info, image info title, equals that new caption.
So now we can easily recaption these on the fly. So since I've created a new command, let's go ahead and add that to our resources just so that we can point to it there. I've got this commands and converters. And we'll just add that new command in there. Remember, it's called get updated caption command and in the image control we're calling get updated caption command. So we'll go ahead and put a breakpoint on here just to watch that work.
So now we've got this new button that's been added, this new speech button. A little bit of a UI goofiness there. But now we have this new speech button. If we pull that up and say nothing, then it's not going to recognize anything, it won't change anything, so let's just pull that up. We'll just cancel that, it does nothing. So put breakpoint there now. Now let's pull that up and we'll just say a woman looking through binoculars.
A woman looking through binoculars. - [Prompt] Heard you say a woman looking through binoculars. - [Narrator] And there we go. We see that that speech recognition result has content and it went ahead and updated that without any problem in our UI. So this is pretty cool feature here, the way that we can just let people instead of having to type a bunch of stuff in, they can just use text-to-speech and update their caption. So we see that this one says a giraffe standing next to a cat, I think that's actually a leopard.
We'll just say a leopard looking at the camera. We'll try that. A leopard looking at a camera. - [Prompt] Heard you say a leopard looking at the camera. - [Narrator] That's great. We'll go ahead and take the breakpoints off there. And that's now been updated to be a leopard looking at the camera. And we see a pile of fruit here that we've added, why don't we change this to a close-up of apples.
A close-up of apples. - [Prompt] Heard you say I'm closed up of apples. - [Narrator] And that's fine, I don't care that it misunderstood me, but we got close on that one. I'm closed up of apples sounds fine to me. Let's go ahead and say that. So you see how cool that is the ability to add this sort of speech recognition functionality in here with virtually no code. In fact, if we go back and look at the code here, this was just extra code. In fact, we didn't need it at all if we didn't have the UI.
And then all we're really doing is spitting up a recognizer, compiling the constraints and then awaiting a response, that's it. So three lines of code and we're done in implementing that. So that's pretty dang cool. But what I thought might be a little bit better is to see if we can somehow bring cortana into the mix and have cortana integration for voice commanding and speech recognition as well. So why don't we flip back over to the slide deck and talk a little bit about cortana and cortana integration.
Note: This course was created by Wintellect. We are pleased to host this training in our library.