Thanks for picking the Build a Bot workshop! Slides are here. Here's what you'll need to complete every piece of the workshop:
It's currently the only desktop browser that supports speech recognition (see the kinda-sorta-spec). Really.
If you'd rather use Chromium, you can try these instructions. Chrome uses Google Cloud Speech to do its recognition rather than handling things locally, hence the API key requirement. Which also means that you probably can't make this work in Opera/Brave/etc.
An api.ai account
We'll use this to allow our bot to keep context, and to allow us to communicate with the bot more conversationally, rather than via a limited set of commands.
When we start getting into API integrations, you'll want to have a look at /query response docs on the client side and fulfillment request docs on the web hook side.
The workshop code we'll be using is containerized to avoid spending too much time getting y'all set up with a consistent PHP environment. For the curious, we'll be building a runit-managed nginx + fpm combo container straight from an Alpine Linux base. The entire container is quite small, and the build steps don't take very long or pull much data, but if you're paranoid, build the container (will link here when it's available) prior to arriving at the workshop. The changes we'll make during the workshop will only affect the last few layers, if anything at all, so they won't take long.
If you'd rather not install Docker, you can use php -S
to get the same effect, provided you have PHP 7.1 + the proper extensions
if you have the curl extension and can run Guzzle you'll be fine). To see what exact requirements are needed, look at the
Dockerfile in the workshop repo.
An ngrok account
Our application needs to expose a web hook to the outside world, and ngrok allows us to do that, including HTTPS support. The free tier is fine for what we'll be doing. HTTPS support is doubly useful because running speech recognition from within an HTTPS page goes a lot more smoothly than when the page is running on plain old HTTP.
We don't have time to cover everything in this tutorial, but if you want to swap some of the components I've mentioned for others, here are some good places to start:
First, you'll need to get the audio. In-browser, see https://www.html5rocks.com/en/tutorials/getusermedia/intro/. From there, you can pass it to Google or IBM Watson to get recognized speech back. Both Google and IBM's APIs let you stream audio to, and text from, them. Or you can push an audio file all at once.
Amazon Lex is another service with responsibilities similar to api.ai's, as is Facebook's wit.ai. Lex also builds in speech recognition, though that costs more than just using text-based NLU. Local options for this component are available, but I haven't used them.
If you're targeting a non-browser, or otherwise don't want to synthesize speech locally, Amazon Polly is a solid option with a generous free tier for a year (AWS account required).
You'll build the NLU piece of your skill inside Alexa's own system, then send queries either to Lambda or to a web hook of your own. This tutorial will guide you through building a skill (you'll need Amazon and AWS accounts). You can use the Reverb app (Android / iOS) test your new skill "in the wild" if you don't have something that builds Alexa in (the official Alexa app currently doesn't build in the voice assistant feature, oddly enough).
I had to pick what to cram into this (relatively short) workshop, but API.ai has an integration for Messenger, so you can reuse the bot you've created here over there.
Each provider has their own set of APIs, and their own UI restrictions. For Google in particular, you're looking for Voice Actions to tie into their equivalent of Alexa skills.
Sure! Not tagging until after the workshop so I'm not tagging a buggy release, but after that, definitely. If I forget, tweet at me (@iansltx).