Join Deloitte Insights for an in-depth discussion in this video Speech recognition, part of Cognitive Technologies: The Real Opportunities for Business.
- In this lecture we're going to look at speech recognition. And Eric Nyberg is back to help me with that. Speech recognition means having computers recognize the words and even the tone or emotion in human speech. An example many of us are familiar with is Siri, the virtual personal assistant from Apple, that allows you to pose questions with your voice. Compared to text processing, speech recognition has special challenges, including handling diverse accents, background noise, distinguishing between homophones, for instance, B-U-Y and B-Y sound exactly the same.
And they need to work quickly at the speed of human speech. Eric, can you talk us through how speech recognition works at a higher level? - Sure, David. The first step is to analyze the recording of speech, which is typically an acoustic wave form in order to understand the boundaries between phonemes, so human speech is made up of individual sounds called phonemes. And then those phonemes are matched to individual words and each one of those mappings is potentially ambiguous when there's noise in the environment, but finally, we're able to recover a representation of the words and every sentence in the speech signal.
- [David] So breaking down the sound into chunks and then determining what features of the language they may represent and then deducing what words they might represent? - [Eric] Exactly, that's correct. - [David] So strung together in a pipeline, similar to how text processing works? - [Eric] Right. - So the applications of speech recognition are very broad, including hands-free writing, such as medical dictation, controlling mobile devices and mobile web search, voice control of computer systems and other devices, such as car entertainment systems and household appliances, and automated telephone customer service, and even surveillance in law enforcement.
Speech recognition is not perfect, but it's getting pretty good. Google recently published a paper saying that its speech recognition system had achieved an error rate as low as 8%. Baidu, a Chinese internet company, recently reported that its speech recognition system could handle background noise and has an 81% accuracy rate. With this kind of performance, is speech recognition basically a solved problem? - Well, David, I think what we're seeing is that for certain domains and certain application areas, speech recognition can be very successful today, but there are still many domains where there's a specialized vocabulary or maybe where the operating environment is very challenging because of noise, let's say, we're operating a piece of heavy machinery at a job site, where we still need to do more research on being able to adapt to particular domains for speech recognition.
- So web search, like Google does with voices, a special case, but there are lots of other applications that might be more challenging? - Absolutely. - And looking forward, what are some of the biggest applications of speech that you see coming commercially in the next few years? - Well, I think, David, you already hit on one that's very important, which is allowing people to access information using spoken language or even spoken dialogue. I think also there's a growing amount of data available which represents recordings of human speech and it will be wonderful to be able to mine that and process that automatically.
If you think about all of the broadcasts that are made every day or all of the recordings that are made in surveillance context, where there's simply too much data for humans to analyze it, we're going to need advances in speech recognition in order to do a better job of coping with that huge explosion of data. - And so you see the application's going to expand over time? - Absolutely. - Great, so let's wrap up. Speech recognition has special challenges beyond those found in text processing, including handling accents, homophones, and background noise.
Applications of speech recognition are diverse, ranging from web search to hands-free control of in-car systems to surveillance. The accuracy of speech recognition systems isn't perfect, but it's getting very good, but there are still lots of problems to tackle in the field.
- Artificial intelligence explained
- Cognitive technologies explained
- Supervised, unsupervised, and reinforcement learning
- Machine learning models and algorithms
- Language, speech, and visual processing
- Business applications of cognitive tech
- The impact of cognitive technologies at work
- Future of cognitive technologies