Get an introduction to the Speech API, what possibilities it brings to your applications, and its importance within Microsoft COG services.
- [Instructor] Next, let's talk about the speech API. The speech API is a co-pillar of Microsoft Cognitive Services. Why? Well, from the very get-go speech has been the most natural way we have been able to communicate. Even before written text, we learned how to talk first. We talk to our pets and we've always dreamt of being able to talk to our computers. This has been the work of science-fiction, movies, that we want to be able to talk to our computer and the computer responds accordingly.
So naturally the Microsoft Cog Services has given a lot of attention to speech. What are the kinds of things you can do with the speech API? You can do things like, speech-to-text and text-to-speech. Really, you just talk to the computer and submit the audio as one of many formats and the speech API will try and recognize it, and return you back text. Or you can submit text along with what audio format you want in return, and you'll get a computer voice reading out the text.
You can also recognize the speaker. Remember when you were a child, and either you're mom or you're dad would call you? And you knew instantly who was calling you, because, you recognized their voice. Why shouldn't computers be able to do so? So you go through a process of enrolling your voice, or if it's a famous voice it'll recognize it anyway and then it'll recognize the voice irrespective of what is said. And you can also do real-time language translation. If you think about it, speech-to-text, text-to speech, so real-time language translation is a matter of converting a speech in a language to text, passing it through your translation engine, and then converting that text back to speech.
So yes, you can get real-time language translation, voice-to-voice, end-to-end. All of this is possible with Microsoft Cog Services.
- Exploring the possibilities of the Vision API
- Submitting an image to the Vision API for processing
- Asking the Vision API to recognize faces
- Working with the Speech API
- Writing speech-to-text code
- Working with the Language API
- Getting languages for translation
- Language Understanding (LUIS) concepts