Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
Freepbx Voicemail Transcription Script: Google Speech API
# sendmail-gcloud
# Original source created by N. Bernaerts:
# modified per:
# modified per:
# current version:
# Notes: This is a script modified from the original to work with a FreePBX Distro PBX so that email notifications sent from
# Asterisk voicemail contain a a speech to text transcription provided by Google Cloud Speech API
# License: There are no explicit license terms on the original script or on the blog post with modifications
# I'm assumig GNU/GPL2+ unless notified otherwise by copyright holder(s)
# Usage: copy this file to /usr/sbin/ set ownership to asterisk:asterisk and make it executable.
# In the [general] section of /etc/asterisk/voicemail.conf set mailcmd=/usr/sbin/sendmail-gcloud
# or in FreePBX 14 GUI go to Settings->Voicemail->Email Config->Mail Command to set this.
# This script also uses dos2unix, ensure it is executable by the asterisk user (chmod 777)
# Requirements:
# Google Cloud services account (paid--free trial available) with Google Speech to Text API enabled
# gcloud cli utility
# gcloud utility must be authenticated. Before doing this, 'su asterisk' so authentication happens in the correct user account
# authenticate with: gcloud auth login
# select project: gcloud config set project <Project ID>
# jq (yum install jq)
# sox with flac
# Version History:
# 2019-02-27 Initial commit by tony722
# set PATH
# save the current directory
pushd .
# create a temporary directory and cd to it
TMPDIR=$(mktemp -d)
# dump the stream to a temporary file
cat >>
# get the boundary
BOUNDARY=$(grep "boundary=" | cut -d'"' -f 2)
# if mail has no boundaries, assume no attachment
if [ "$BOUNDARY" = "" ]
# send the original stream
# cut the original stream into parts
# stream.part - header before the boundary
# stream.part1 - header after the bounday
# stream.part2 - body of the message
# stream.part3 - attachment in base64 (WAV file)
# stream.part4 - footer of the message
awk '/'$BOUNDARY'/{i++}{print > "stream.part"i}'
# cut the attachment into parts
# stream.part3.head - header of attachment
# stream.part3.wav.base64 - wav file of attachment (encoded base64)
sed '7,$d' stream.part3 > stream.part3.wav.head
sed '1,6d' stream.part3 > stream.part3.wav.base64
# convert the base64 file to a wav file
dos2unix -o stream.part3.wav.base64
base64 -di stream.part3.wav.base64 > stream.part3.wav
# convert the wav file to FLAC
sox -G stream.part3.wav --channels=1 --bits=16 --rate=8000 stream.part3.flac
# convert to MP3
sox stream.part3.wav stream.part3-pcm.wav
lame -m m -b 24 stream.part3-pcm.wav stream.part3.mp3
base64 stream.part3.mp3 > stream.part3.mp3.base64
#create mp3 mail part
sed 's/x-[wW][aA][vV]/mpeg/g' stream.part3.wav.head | sed 's/.[wW][aA][vV]/.mp3/g' >
dos2unix -o
unix2dos -o stream.part3.mp3.base64
cat stream.part3.mp3.base64 >>
RESULT=`gcloud ml speech recognize stream.part3.flac --language-code='en-US'`
FILTERED=`echo "$RESULT" | jq -r '.results[].alternatives[].transcript'`
# generate first part of mail body, converting it to LF only
mv stream.part
cat stream.part1 >>
sed '$d' < stream.part2 >>
# beginning of transcription section
echo "" >>
echo "--- Google transcription result ---" >>
# append result of transcription
if [ -z "$FILTERED" ]
echo "(Google was unable to recognize any speech in audio data.)" >>
echo "$FILTERED" >>
# end of message body
tail -1 stream.part2 >>
# add converted attachment
cat >>
# append end of mail body, converting it to LF only
echo "" >> stream.tmp
echo "" >> stream.tmp
cat stream.part4 >> stream.tmp
dos2unix -o stream.tmp
cat stream.tmp >>
# send the mail thru sendmail
cat | sendmail -t
# go back to original directory
# remove all temporary files and temporary directory
rm -Rf $TMPDIR

This comment has been minimized.

Copy link

mantisae commented Nov 9, 2019

I have been attempting to make this work for the last 4 hours and getting nowhere. I know I have google cloud set up correctly because I can change the TMPDIR to an actual location and have the system write the files there then copy the "gcloud ml speech recognize stream.part3.flac --language-code='en-US'" command and run it in that dir as the asterisk user and get the JSON however it is not running as part of the script.


This comment has been minimized.

Copy link
Owner Author

tony722 commented Nov 10, 2019

This is code that I'm using on a production FreePBX server (just diffed it to be sure) . Keep trying, checking logs, messages, permissions, and whatever else. Hopefully you can get it. 😄 Good luck!


This comment has been minimized.

Copy link

mantisae commented Nov 10, 2019

I forgot to post back I decided to try a stupid this morning and "reboot -h now" I left a vm after the system restarted and it transcribed properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.