Amazon Echo is by far the most popular device in the consumer IOT realm, winning the race in the screen-free product category.

Powered by Amazon’s voice control system Alexa, the Amazon Echo speaker has sold over 3 Million units bringing voice-controlled gospel to the mainstream home.

As the Alexa store is opening its gates for developers, they can now leverage the voice platform to create new Alexa skills and improve the system on a regular basis.

In this post, I’ll walk you through the process of obtaining approval for the Alexa Skill we developed in the interest of helping you successfully submit your own skill.

[bctt tweet=”Submitting an Alexa Skill: This is how we did it. Step by Step.” username=”Audio_Burst”]

 

Why Alexa?

Since the company’s ultimate goal was to create the internet’s audio repository (and developed the technology to do so), building a device that interacts with the user in an audio-only format was a natural fit.

So we decided to build an Alexa skill that would make use of our tech.

Alexa is a great platform for building eyes-free + hands-free applications (Skills). As a company that creates content for exactly these use cases, it seemed like an obvious approach.

In addition, for entering this new market, building an Alexa Skill was a great start.

It’s fun and there are many resources that will help you get started.

Our skill: News Feed

We built an Alexa skill that allows you to get news of high relevance, on any topic.

The information comes from top news outlets, on any subject you’re interested in. It also provides real-time updates.

Instead of having Alexa read the news for you, you get a live audio burst from the news publication.

While using our technology as a resource, we initially thought the sky was the limit, but later we found out that Alexa has its limitations.

Before you start, I would recommend you to read Amazon’s guidelines on how to work with Alexa’s API.

It will save you a lot of trouble and time.

While developing a skill is rather simple, the real challenge comes in the submission phase.

[bctt tweet=”Learn how Audioburst developed their News Feed Alexa Skill” username=”Audio_Burst”]

The big obstacle: Getting your skill approved

Right after the development phase is finalized and your skill is ready to go up on the Alexa Skill Store, comes the skill submission phase.

You might think that the most challenging part is building the skill itself, but for a developer, the hard part is actually getting approval from Amazon – Our skill got rejected 3 times before our fourth submission was finally approved.

Amazon does a really good job in documenting the development process but has trouble communicating during the submission process.

You’ll get a rejection log, but it won’t be specific nor clear enough to really understand what went wrong.

I’ll try to save you some time by sharing the rejections we got and how we resolved them – and eventually got an approval.

image01

A submission failure Email from Amazon

 

Filling in the necessary details

The first thing you need to do to submit an Alexa skill is to fill in the necessary details that will show up on your Alexa Store page. You can download our description spreadsheet for free as an example of our own approved skill.

alexa-1

You will have to choose a name, description, example phrases and invocation method. You will also need to create a logo for your skill.

After you’re all good on the copywriting front, it’s time to submit your skill. But before you do so, here are a couple of things to watch out for:

1. Make sure the system doesn’t hang and wait for the user to interact with it

On our first submission attempt, we got 6 errors and most of them were related to this specific issue.

The system should “ask” the user what he wants to do and inform him that he can exit the skill at any given time by saying “Stop”.

If it fails in doing so, there’s a possibility that the system will be hanging with no feedback, waiting for the user to “say anything” without receiving any instructions. This will lead to a rejection of your Skill submission.

image00

The requirement is that you always finish with a question, to keep the interaction going:

image02

2. Example utterances vs Sample utterances Issue:

After the 2nd submission, we only got 1 submission error.

Our example utterances was not matching our sample utterances.

We had a very hard time understanding what exactly was the issue, and the “request for action” on Amazon’s rejection was not clear. We couldn’t understand what was the difference between the 2.

It was extremely hard to understand but we managed to resubmit the skill after fixing what we assumed was the issue. We got another rejection.

There aren’t a lot of Alexa resources and support at the moment – so Google wasn’t much help.

Finally, we understood that the sample utterances in the Interaction model should not be:

QuestionIntent What is the latest on {question value|question}

As documented, but rather: QuestionIntent What is the latest on {the presidential elections|question}

Which matches the example utterances in the publishing information:

“Alexa, Ask News Feed What is the latest on the presidential elections”

but without the invocation word “Alexa, Ask News Feed”.

Again – You need to use the actual value that you used at the example utterances, without adding the invocation word.

I found it very bizarre that the actual example should be in the sample (as the system should recognize any phrase following the “What’s the latest on” prefix).

3. Your Skill will probably not get accepted in the first attempt so plan ahead of time.

Amazon teams respond very quickly and analyze your skill within 1-2 business days.

Even after reading and learning from other people’s mistakes, you’re very likely to fail a submission or two. Instead of being worried about it, know that eventually you’ll figure it out and add a few extra days to your timeline to be able to check it back and forth.

The marketing and PR team will thank you.

[bctt tweet=”Developing an Alexa Skill? Don’t submit before reading this post” username=”Audio_Burst”]

 

Summary

Creating an Alexa skill is not a complicated process. It’s a very cool use of our product and many other products out there.

There is still not enough documentation on how to use Alexa nor on the review and submission process. You will probably encounter challenges we didn’t face during the submission phase – but still, as early adopters, it’s important to document as many cases to help future developers learn from past mistakes.

Have you encountered any problems submitting your skills? Share them with us in the comments (don’t forget to include your solutions).

Founder, Design & architect Audioburst’s products, with focus on the Speech-To-Text world.

Are you new to the world of Alexa Skill development? This Alexa Skill Glossary is what you’ve been looking for.

Alexa is the voice service that powers Amazon Echo. It comes with some incredible capabilities straight out of the box, but you can also add new functions to it, called Skills (equivalent to apps on the smartphone).

To develop for Alexa, you can use the Alexa Skills Kit.

The kit helps you add new capabilities by creating either custom, home or flash briefing skills. In order for users to enjoy the new skills you’ve created, you can also describe the VUI by mapping the user’s speech input data and the intents the network service can handle.

Alexa Skills Kit will spice up your days with some grownup fun.

Alexa can play music, answer all questions about anything you would like to know, order a pizza, call a car from Uber, set your alarm, tell you the weather, remind you of your calendar notes and more.

With the combination of self-service APIs, tools, documentation and code samples, the Alexa Skills Kit is also great for creating your Smart Home.

Sharpen your skills with the Alexa Skill Glossary

Sharpen your skills with the Alexa Skill Glossary Photo Credit: netguru.co/blog/voice-recognition-tools-review

Amazon Echo connects to smart home gadgets perfectly. Amazon Echo Dot is great if you want to expand on it and bring the voice commands to multiple rooms in your house.

If you’d like to get started with developing for Alexa, we’ve created a complete Alexa Skill Glossary to help you define how the users will experience the new capability you’ve created.

We will start with the basics and continue with the necessary terms you have to know in order to develop a new Skill.

Check out the Alexa Skill Glossary so you can follow up on the magic behind this wonderful tool:

The practical Alexa Skill Glossary

Speech Recognition Application

Speech Recognition Applications enable a device/computer to convert spoken words into written text to find the best matching word sequence.

A Speech-To-Text (STT) engine allows you to dictate a message and your device will send it as text.

A Text-To-Speech (TTS) engine will reproduce the sound of the written words.

Google Translate

Artificial Intelligence

A modern definition for AI is described as the study and design of a system that perceives its environment and is able to perform tasks that normally require human intelligence. These tasks include visual perception, speech recognition, decision-making, and translations between languages.

In 1956 defined as “the science and engineering of making intelligent machines.” by John McCarthy.

Is this how Alexa will look like in 2056?

Is this how Alexa will look like in 2056?

Network Service (SaaS)

Cloud-enabled service that delivers skills and takes requests from Alexa with an intent to give responses from text and speak back to the user.

Self-Service API (Application Program Interface)

A software platform through which developers organize access to data and business processes, and to enable web applications to interact with other applications. Self-service APIs allow developers to have access to a large range of features so they can evolve and customize their project. It is great for developers to have total control of their creations.

VUI (Voice User Interface)

An interface between a user and a system which utilizes the recognition of spoken words or voice commands to initiate a service and enable automated suitable responses.

Unlike voice response systems, VUIs accept continuous speech rather that short phrases only.

There are some important factors to be taken into account since users usually expect some emotional involvement when speaking to a voice user interface.

Skill

Built-in abilities and capabilities of Alexa – such as playing music, notifying on the latest news or telling you the weather – given by developers. It includes the code and the configuration provided on the developer portal. There are three types of skills that can be built:

Custom Skills

A custom skill is a flexible, yet complex skill with a custom interaction model provided by developers. For custom skills, the developer should define three things:

Intents: requests the skill can handle.

Ex. Order food delivery.

Interaction Model: the words users may say to make intents.

Ex. “Order spicy tuna roll”.

Invocation Name: The name Alexa uses to identify the skill.

Ex. Food delivery.

Smart Home Skills

A skill that controls smart home devices such as lights and thermostats. For smart home skills, the developer should define:

Skill Adapter: The code that makes the skill respond to a particular directive.

Ex. Decreases the temperature when the user requests “decrease the temperature by 3 degrees”.

The device directives and words a user may use to request those directives are supported by the API.

Flash Briefing Skills

The way to provide content for a customer’s flash briefing. For a Flash Briefing Skill, the developer must define:

The name, description & images for a flash briefing skill and one or more content feeds for a flash briefing skill.

The API defines the words users may use to make those requests.

Ex. “Tell me the weather.”

Interaction Model

An interaction model is an abstract definition of the interaction that will take place between your API and the applications that use it (similar to a graphical user interface in a traditional app). Instead of pressing buttons, users make requests by voice.

This video by Ronnie Mitra for further guidance on how to build an Interaction Model and for an introduction of an interaction based design philosophy:

Intent

The action that fulfills the user’s spoken request. There are three types:

Ex. Horoscope

Full Intent

A spoken request in which the user expresses everything that is required to complete their request, all at once.

Ex. “Alexa ask Elle.com for today’s horoscope for virgo.”

Partial Intent

A spoken request in which the user expresses just partial information of what is required to complete their request.

Ex. “Alexa ask Elle.com for the horoscope.”

No Intent

A spoken request with minimal information.

Ex. “Alexa talk to Elle Magazine.”

Slot

The optional arguments in an intent.

Ex.  A “sign” is a possible slot for the “Horoscope” intent. When the intent has been sent, the slot “sign” can carry the value Virgo, Cancer, Pisces, etc… when the intent is sent.

Speech Input Data

The set of mapped values added for any custom slots supported by the developer’s skill and a list of sample utterances or common statements that invoke the intents.

Prompt

A voice response that should be directed to the user to ask for more information.

Ex.

User: “Alexa, talk to Elle.com”.

Prompt: “You can ask for today’s fashion trends or your horoscope. What will it be?”

Now to you

Are there any basic terms we left out? Please add them in the comments.

Founder, Design & architect Audioburst’s products, with focus on the Speech-To-Text world.

How did we get to discuss Siri vs google? It seems like within months we went from trying to understand how Siri works to asking Alexa to turn on the lights when we get home.

Alexa, Siri, Google Now and Cortana all walked elegantly into our lives changing our daily habits. And there is more yet to come.

While being similar but also very different, each one of those VA’s has its own perks and drawbacks.

They all do sorta different things, but kinda the same way and they all have a similar purpose but perform with a slightly different approach… Confused yet?

It’s time to make some sense of this mess and put things in order.

We decided to evaluate and compare the 4 popular assistants based on different needs and categories  so you can decide – Which of the four leading personal assistants of the 2016 AI world performs best: Siri vs Google Now vs Cortana vs Alexa.

Game On!

Siri VS Google Now VS Cortana VS Alexa

Speech Recognition

All voice-recognition systems are designed to recognize your speech patterns and personal accent as you use them on a daily basis.

Siri

Siri is great at understanding natural language. It responds to different ways of phrasing the same questions. Further than words, Siri’s “personality” identifies meaning and is the most natural assistant when using common language.

On the other hand, you need to be very close to your phone when you talk to Siri and it has to be your voice. Siri will have trouble understanding you otherwise.

Google Now

Google Now was the most successful out of all when it comes to coping with a wide range of accents but, it’s still missing Siri’s “personality” and the ability to cope with natural language.

Cortana

Besides the fact that you can also type to Cortana, its voice recognition is similar to Siri’s as it identifies meaning and not just words. It has some different approaches compared to Siri for the same questions, and even though it hasn’t been around for so long, it is a viable competitor and works surprisingly well.

Alexa

Alexa is always attentive for its name. It has the most impressive microphone (and speaker) technology that can pick up phrases and commands in a normal voice from across the room. Echo’s speaker technology can hear any voice from wherever.

This can sometimes be a drawback as it may react to other voices coming from the television.

Range of Capabilities

Siri

Siri is great for on-the-go purposes and therefore, it accurately gives you directions while you’re in the car. It is useful when you don’t want to pick up the device and need to send out a specific message from your phone. Other capabilities include searching the web, making calls, opening apps, setting up calendar events and alarms, looking up directions in Apple Maps and so on…

If you ask Siri for the news, it will give you a bunch of links but will not read the news out loud. Same goes if you want to buy something – Siri will give you the link where you can make your purchase but will not buy it for you.

Apple improved the knowledge for storing your personal data to have a more personalized service in the iOS 9 but there is still room for improvements in this aspect.

Google Now

Google Now is the star when it comes to handling your personal life. It has a collection of personal data making your daily commute, your interests and your schedule so easy to control. This information is used for the “predictive cards” feature, which displays your interests before you even ask. It is the best system to anticipate your needs and proactively assist you.

Google Now will tell you the weather, give you directions to common places you visit, remind you about scheduled events, give you updates depending on your interests, play music and so on… You can also ask for information from your Google accounts.

Cortana

Cortana is great with location-based reminders like “Remind me to call Dad when I get home.” It is also quick when it comes to searching the internet for information – however, when it comes to simple questions like “What day is it?”, it might get confused.

You can also make calls, send text messages, search the web, schedule calendar events, ask for the weather and give you information about places you frequently visit.

Like Google Now, Cortana uses your information and things you are interested in on its “notebook” feature in order to anticipate and provide you with information you might find useful.

As a mixture of both Siri and OK Google, you can either talk to Cortana for something you need or train it to anticipate information about your interests and life. You can also set hours when you don’t want to be disturbed.

Alexa

The power of Alexa is made for creating your Smart Home. It is amazing for handling home devices, buying things online – like adding items to your Amazon Shopping Cart or placing an order of your supermarket weekly shopping list when you run out of food. It is also useful for making reminders, playing music, adjusting the thermostat or turning off/on the lights. Also, if you ask Alexa for the news, it will read its “flash” news briefings out loud.

When it comes to sending out a personal message, you must have a specific calling app, otherwise, it is unable to do so. Since its best purpose is for home devices, Alexa can give you travel estimates but you need to enter the mobile app in your current location.

Working With Third-Party Apps

Siri

Siri cannot interact with most of the third party apps. It plays apple music but it can’t ask for an Uber or play Spotify. It is limited to work only with other Apple apps even though it works effectively.

Apple integrated seven types of apps (Ride booking, Messaging, Photo and video, Payment apps, VoIP calling, Workouts and CarPlay) for iOS 10.0, but says more will be coming in the future.

Google Now

Google Now is also working with a variety of third-party apps where you are able to add to-do-lists on Trello or play some music on Youtube or Pandora. Depending on the type of phone you have, you may have to activate Google Now from within the main Google search app.

The iOS version of the Google app is limited as it is primarily for web searching rather than being an all-round digital assistant.

Cortana

Cortana is great when working with other apps on Windows devices and efficiently creates emails, reminders or notes.

Currently, only a few third-party apps can successfully integrate with Cortana but, as time passes, we should see more development in this aspect.

Alexa

Alexa is great with third party apps reaching up to over 130 applications. You can ask for a car from Uber, play music on Spotify and even buy things online.

As mentioned in Techcrunch’s article, “After all, the Alexa app doesn’t even refer to its list of third-party software as “apps”, rather referring to these add-ons as “skills” instead – meaning, things that expand the functionality of the Alexa-powered Echo speaker.”

Siri VS Google Now VS Cortana VS Alexa - Who's the winner?

Siri VS Google Now VS Cortana VS Alexa – Who’s the winner?

Switching On When Needed

Siri

In order to activate Siri, you can press and hold the Home button of the iOS device you are using or leave the always-on option activated in order for the system to know that whenever you say “Hey, Siri”, it is the start of a new command. When your phone battery drops down below 20%, Siri turns off automatically.

There is an option that allows you to train Siri to respond only to your voice. This way, the system will not activate itself with other voices like the television.

Google Now

To activate Google Now, you tap a microphone on the screen. As all other systems, you can also activate an always-on option which serves as an indicator for your digital assistant to listen for the phrase “OK Google” as the request for a new command.

“OK Google” can also be trained to react to a certain trusted voice.

Cortana

To activate Cortana, you tap a microphone icon on the screen. Like Siri, when your phone battery drops down below 20%, it is turned off automatically.

Additional to a personal voice training option, Cortana also includes an option that allows it to activate “to anyone” in case you are sharing the device with other members.

Alexa

Amazon Echo/Echo Dot will listen to “Alexa” for activation as long as it’s turned on. Moreover, there is also an option to turn its microphone off.

Alexa Assistant sometimes reacts to voices it shouldn’t (like the television) because of the advanced Echo microphone technology.

Smart Home vs On-The-Go

Siri

Siri is great for on-the-go as it is available on multiple portable devices. Apple is still working on making the use of Siri at home more effective.

Google Now

Google Now is convenient for on-the-go. It is has a great hands-free accessibility to your mobile device, as Siri. However, Google Assistant, which is integrated into Google Home, is suspected to merge with Google Now in the near future.

Cortana

As a combination of both OK Google and Siri, Cortana is great for on-the-go purposes. Moreover, Microsoft’s goal is to reach smart homes as Windows 10 will work with a wider range of devices and appliances in 2017.

Alexa

Alexa is not a device you purchase for “on-the-go” services. The software is ideal and the most useful when creating your Smart Home.

Device Itself

Siri

Siri is available on iDevices which allow it to be accessible everywhere. In your tablet, phone and even in your wrist. Apple is making an effort to make it work better as a smart home device.

Google Now

Google Now works on Android, iOS, and PCs, though its hardware-controlling capabilities are limited on iOS.

Cortana

Cortana is available on Windows 10 desktop OS as well as Windows 10 Mobile, which Microsoft expects to be on 1 billion devices in two to three years. It is expected to be available also on Android and iOS in the future.

Alexa

Alexa is available on Amazon Echo and Echo Dot which is great if you want to set it for your home, home devices and later on expand it to other rooms with the Echo Dot.

In conclusion…

So in the Siri VS Google Now VS Cortana VS Alexa wars, who wins?

We’ve compared all four digital assistants – Siri, that has been around for a while, Cortana which is new to the game yet a viable competitor, Google Now, currently in a significant development stage and Alexa, a prominent technology for smart homes.

At the moment, even though they have the same technology, Alexa is in a different game. As a device with top performance for the smart home, it works great with third party applications and Amazon Echo’s speaker/microphone technology is exceptional.

Siri, Google Now and Cortana on the other hand, have a similar purpose leaning towards a more  “on-the-go” use.

Siri’s “personality” and ability to understand natural language makes the experience the most smooth and unique out of the three. However, Google Now’s personalized assistance automatically keeps your daily schedule updated with an amazing proactiveness. Cortana is the most flexible as it mixes Siri’s “personality” with Google Now’s proactiveness, but it is in its earliest stage and has a lot of room for improvement.

All three of them are working on developing improvements for smart home purposes and third party application services while Alexa is working on their “on-the-go”capabilities for delivering better assistance outside of your home.

Founder, Design & architect Audioburst’s products, with focus on the Speech-To-Text world.