Start free trial Sign in

From the course: Artificial Intelligence for Business Leaders

Automated speech recognition

From the course: Artificial Intelligence for Business Leaders

Start my 1-month free trial

Automated speech recognition

“

- So far, you've seen how natural language processing can read and write text, but more and more people are using NLP with speech recognition. This allows people to feel like they're having a conversation. Automated speech recognition, or ASR, is closely related to natural language processing, though they're not the same thing. But they do work closely together to improve the conversation. Much like natural language processing, AI systems view speech recognition as a data challenge. When you speak, your vocal chords create very precise sounds. These sounds traveled through the air like ripples on the surface of a pond. Computer systems listen to these sounds and convert them into an audio waveform. If you've ever worked with an audio program, then you've probably seen these audio waveforms. They look like mountains reflecting against a calm lake. You'll notice peaks and dips as the audio moves through time. It turns out that audio waveforms contain very accurate digital data, even across different voices. That means that if you hear me say something like "This is an audio waveform," it will create a very similar pattern to you saying the same thing. It'll create very similar peaks and valleys. Then a computer matches this audio data with existing patterns. But as you can imagine, there's nearly an infinite number of ways that people can mix and match words and phrases. So AI systems often rely on something called the Hidden Markov model. This model helps computer scientists determine the probability of your next set of words based on what you've already said. That means that the system will look at the initial waveform, then try to connect to the first few words. So in this case, the system might hear "This is an," then recognize that the word that follows will probably be a noun. People don't usually say something like "This is an eating." Since the system thinks that the next word will be a noun, it can actually lower the range of possibilities. In this case, we will use the noun audio and then the noun waveform. That means that the earlier the system understands, the easier it will be to recognize your later words and phrases. Now, remember that speech recognition is still different from natural language processing. Here, the AI system is simply trying to recognize the sounds that you make while speaking. In NLP, the AI system is actually trying to understand human language. That's why speech recognition has been around so much longer than NLP. It's much easier for a system to convert sound to text than to understand the complexities of human language.

Contents