Skip to content

Instantly share code, notes, and snippets.

@ompugao
Forked from alotaiba/google_speech2text.md
Created June 24, 2013 10:39
Show Gist options
  • Save ompugao/5849208 to your computer and use it in GitHub Desktop.
Save ompugao/5849208 to your computer and use it in GitHub Desktop.
0

Google Speech To Text API

Base URL: https://www.google.com/speech-api/v1/recognize
It accepts POST requests with voice file encoded in FLAC format, and query parameters for control.

Query Parameters

client
The client's name you're connecting from. For spoofing purposes, let's use chromium

lang
Speech language, for example, ar-QA for Qatari Arabic, or en-US for U.S. English

maxresults
Maximum results to return for utterance

POST

body
Should contain FLAC formatted voice binary

HTTP Header

Content-Type
Should be audio/x-flac; rate=16000;, where MIME and sample rate of the FLAC file is included

User-Agent
Can be the client's user agent string, for spoofing purposes, we'll use Chrome's

Examples

These examples assume you have a voice file encoded in FLAC called alsalam-alikum.flac.

create flac file from wav

sudo aptitude install sox sox input.flac input_fixed.flac rate 16k channels 1

wget

This will save JSON response in a file called recognized.json

wget --post-file='alsalam-alikum.flac' \
--user-agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.77 Safari/535.7' \
--header='Content-Type: audio/x-flac; rate=16000;' \
-O 'recognized.json' \
'https://www.google.com/speech-api/v1/recognize?client=chromium&lang=ar-QA&maxresults=10'

curl

curl -X POST \
--data-binary @alsalam-alikum.flac \
--user-agent 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.77 Safari/535.7' \
--header 'Content-Type: audio/x-flac; rate=16000;' \
'https://www.google.com/speech-api/v1/recognize?client=chromium&lang=ar-QA&maxresults=10'

python

  • quote from https://gist.github.com/alotaiba/1730160/#comment-841611

      $ cat speech.py
      import urllib2
      url = "https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US"
      audio = open('rainspain.flac','rb').read()
      headers={'Content-Type': 'audio/x-flac; rate=16000', 'User-Agent':'Mozilla/5.0'}
      request = urllib2.Request(url, data=audio, headers=headers)
      response = urllib2.urlopen(request)
      print response.read()
    
      $ python speech.py
      {"status":0,"id":"57d2d1a7e7f1fa12d200026dde946c34-1","hypotheses":[{"utterance":"the rain in Spain falls mainly on the plains","confidence":0.8385102}]}
    
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment