The Amazon Echo brand of smart speakers, also called Alexa, brought home voice interaction to the mainstream. In this video, you'll get a sense of what these devices do, and how they turn natural language into inputs for your voice-enabled apps.
- [Instructor] Hi and welcome to this course on building skills for Amazon's Alexa. Right off the bat, I want to say that I'll be using that word a lot. If you have an Alexa device in earshot, it has no doubt already sprung to life, awaiting your next word. You may want to consider using headphones for this course so as not to cause additional accidental activations. If you don't have a device nearby or even if you don't own one at all, that's okay. It's entirely possible to build, test, and publish Alexa Skills without owning a single Alexa-enabled device. Now, before we get building, how 'about a little overview of just what Alexa is and how it works? Speech-to-text technology actually goes as far back as the 1950s, when researchers first experimented with translating audio input into text. Since then, there have been many different attempts at letting users speak to and control their devices, from Dragon Naturally Speaking in the '90s to Xbox's Kinect in the late 2000s. But the modern era of voice control isn't about translating your voice into rigid computer commands, it's about creating natural feeling conversations. Along with speech-to-text, we need two additional ingredients: voice synthesis and artificial intelligence. These are the three technologies that enable modern intelligent voice interactions. Interactions like these, with AI-powered voice assistance, reached an inflection point in 2011. That's the year Apple acquired Siri and integrated it into iOS. Shortly thereafter, Google followed suit with Google Now, an AI assistant that offered similar OS-integrated features for Android phones. And in 2014, Amazon joined the fray with its voice assistant named Alexa. However, Alexa was not just another phone-based AI, rather, Amazon's entry represented another pivot point for voice interactions. Instead of being tethered to your smartphone, Alexa came in the freestanding Amazon Echo device, a smart speaker that could be placed in your home or office. Today, we see a number of devices in the Echo line of products, devices that come with the Alexa voice assistant built right in. Alexa is built on those three pillars I mentioned earlier: speech-to-text, artificial intelligence, and voice synthesis. Let's look at the high level of how Alexa works. First, you engage the voice assistant by using the wake word. Alexa's microphones are always listening for this word, which starts the device actively listening and responding to you. By default, this wake word is Alexa. From there, the device records what you say until you pause long enough so it determines you're finished speaking. At this point, the device will send that audio to the cloud. The audio goes into Amazon's Alexa Service, a cloud service that performs speech-to-text, matches the speech to your skill, and parses the user's words into a format your skill can consume. There's a lot of smart technology that occurs in this step. That's part of what we're going to unpack in this course. Next, that parsed speech is passed along as JSON to your skill service. This can be a web service hosted anywhere in the web that you control. There are many ways to deploy a service like this, but recent developer-friendly releases by Amazon have made it incredibly easy to do. From here, it's your own code that receives the input, and using Amazon's Alexa Skills Kit SDK, crafts a response. In doing so, it may fetch information for the user, turn on a connected smart device, like a light bulb, or even make a purchase. When you return response, Amazon's voice synthesis converts the text to naturally sounding speech and conveys the audio back to the device to speak as a response. Did you notice that very little of this sequence actually involves the Echo device? As long as the Echo can listen to you, connect to Wi-Fi to reach the Alexa Service, and then produce sound, either by its own speaker or by a connected external one, it's good to go. All the heavy lifting is done in the cloud, in Alexa Services, and your own skill service. That's one reason Amazon can make Echo devices so small and cheap, and why it's so easy to get started creating apps for it. So, let's get started.
- Designing effective voice interactions
- Designing an interaction model
- Building voice interactions
- Building an interaction model
- Alexa response functions
- Skills software development kit (SDK)
- Mapping values