Are you new to the world of Alexa Skill development? This Alexa Skill Glossary is what you’ve been looking for.
Alexa is the voice service that powers Amazon Echo. It comes with some incredible capabilities straight out of the box, but you can also add new functions to it, called Skills (equivalent to apps on the smartphone).
To develop for Alexa, you can use the Alexa Skills Kit.
The kit helps you add new capabilities by creating either custom, home or flash briefing skills. In order for users to enjoy the new skills you’ve created, you can also describe the VUI by mapping the user’s speech input data and the intents the network service can handle.
Alexa Skills Kit will spice up your days with some grownup fun.
Alexa can play music, answer all questions about anything you would like to know, order a pizza, call a car from Uber, set your alarm, tell you the weather, remind you of your calendar notes and more.
With the combination of self-service APIs, tools, documentation and code samples, the Alexa Skills Kit is also great for creating your Smart Home.
Amazon Echo connects to smart home gadgets perfectly. Amazon Echo Dot is great if you want to expand on it and bring the voice commands to multiple rooms in your house.
If you’d like to get started with developing for Alexa, we’ve created a complete Alexa Skill Glossary to help you define how the users will experience the new capability you’ve created.
We will start with the basics and continue with the necessary terms you have to know in order to develop a new Skill.
Check out the Alexa Skill Glossary so you can follow up on the magic behind this wonderful tool:
The practical Alexa Skill Glossary
Speech Recognition Application
Speech Recognition Applications enable a device/computer to convert spoken words into written text to find the best matching word sequence.
A Speech-To-Text (STT) engine allows you to dictate a message and your device will send it as text.
A Text-To-Speech (TTS) engine will reproduce the sound of the written words.
A modern definition for AI is described as the study and design of a system that perceives its environment and is able to perform tasks that normally require human intelligence. These tasks include visual perception, speech recognition, decision-making, and translations between languages.
In 1956 defined as “the science and engineering of making intelligent machines.” by John McCarthy.
Network Service (SaaS)
Cloud-enabled service that delivers skills and takes requests from Alexa with an intent to give responses from text and speak back to the user.
Self-Service API (Application Program Interface)
A software platform through which developers organize access to data and business processes, and to enable web applications to interact with other applications. Self-service APIs allow developers to have access to a large range of features so they can evolve and customize their project. It is great for developers to have total control of their creations.
VUI (Voice User Interface)
An interface between a user and a system which utilizes the recognition of spoken words or voice commands to initiate a service and enable automated suitable responses.
Unlike voice response systems, VUIs accept continuous speech rather that short phrases only.
There are some important factors to be taken into account since users usually expect some emotional involvement when speaking to a voice user interface.
Built-in abilities and capabilities of Alexa – such as playing music, notifying on the latest news or telling you the weather – given by developers. It includes the code and the configuration provided on the developer portal. There are three types of skills that can be built:
A custom skill is a flexible, yet complex skill with a custom interaction model provided by developers. For custom skills, the developer should define three things:
Intents: requests the skill can handle.
Ex. Order food delivery.
Interaction Model: the words users may say to make intents.
Ex. “Order spicy tuna roll”.
Invocation Name: The name Alexa uses to identify the skill.
Ex. Food delivery.
Smart Home Skills
A skill that controls smart home devices such as lights and thermostats. For smart home skills, the developer should define:
Skill Adapter: The code that makes the skill respond to a particular directive.
Ex. Decreases the temperature when the user requests “decrease the temperature by 3 degrees”.
The device directives and words a user may use to request those directives are supported by the API.
Flash Briefing Skills
The way to provide content for a customer’s flash briefing. For a Flash Briefing Skill, the developer must define:
The name, description & images for a flash briefing skill and one or more content feeds for a flash briefing skill.
The API defines the words users may use to make those requests.
Ex. “Tell me the weather.”
An interaction model is an abstract definition of the interaction that will take place between your API and the applications that use it (similar to a graphical user interface in a traditional app). Instead of pressing buttons, users make requests by voice.
This video by Ronnie Mitra for further guidance on how to build an Interaction Model and for an introduction of an interaction based design philosophy:
The action that fulfills the user’s spoken request. There are three types:
A spoken request in which the user expresses everything that is required to complete their request, all at once.
Ex. “Alexa ask Elle.com for today’s horoscope for virgo.”
A spoken request in which the user expresses just partial information of what is required to complete their request.
Ex. “Alexa ask Elle.com for the horoscope.”
A spoken request with minimal information.
Ex. “Alexa talk to Elle Magazine.”
The optional arguments in an intent.
Ex. A “sign” is a possible slot for the “Horoscope” intent. When the intent has been sent, the slot “sign” can carry the value Virgo, Cancer, Pisces, etc… when the intent is sent.
Speech Input Data
The set of mapped values added for any custom slots supported by the developer’s skill and a list of sample utterances or common statements that invoke the intents.
A voice response that should be directed to the user to ask for more information.
User: “Alexa, talk to Elle.com”.
Prompt: “You can ask for today’s fashion trends or your horoscope. What will it be?”
Now to you
Are there any basic terms we left out? Please add them in the comments.
Founder, Design & architect Audioburst’s products, with focus on the Speech-To-Text world.