Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Freepbx Voicemail Transcription Script: Google Speech API
#!/bin/sh
# sendmail-gcloud
#
# Installation instructions
# Copy the content of this file to /usr/sbin/sendmail-gcloud
#
# Google Account
# ---------------
# Create a Google Cloud account if you don't have one yet. Free trial is available at https://console.cloud.google.com/freetrial
# Within console.cloud.google.com search for Cloud Speech-to-Text API and enable it
#
# From the Linux command line on the FreePBX machine
# -------------------------------------------
# Follow steps 1 and 2 of the instructions on Google Cloud https://cloud.google.com/sdk/docs/downloads-yum
# Run the following commands on FreePBX;
# cd /usr/sbin/
# chown asterisk:asterisk sendmail-gcloud
# chmod 744 sendmail-gcloud
# chmod 777 /usr/bin/dos2unix
#
# Verify that you have the following (by simply running the command) and if not use yum install;
# jq
# sox
# flac
# dos2unix -V
# Ensure dos2unix is executable by the asterisk user (chmod 777 /usr/bin/dos2unix)
#
# Connect FreePBX to Google Cloud
# su asterisk
# gcloud auth login
# CLI will provide you a url. Copy that and paste it into your browser. Google will give you a verification code to copy. Paste it into the cli waiting for a verification code.
#
# Open FreePBX web interface
# Go to Settings > Voicemail Admin > Settings > Email Config
# Change Mail Command to: /usr/sbin/sendmail-gcloud
# Submit and apply changes
#
# Original source created by N. Bernaerts: https://github.com/NicolasBernaerts/debian-scripts/tree/master/asterisk
# modified per: https://jrklein.com/2015/08/17/asterisk-voicemail-transcription-via-ibm-bluemix-speech-to-text-api/
# modified per: https://gist.github.com/lgaetz/2cd9c54fb1714e0d509f5f8215b3f5e6
# current version: https://gist.github.com/tony722/7c6d86be2e74fa10a1f344a4c2b093ea
#
# Notes: This is a script modified from the original to work with FreePBX so that email notifications sent from
# Asterisk voicemail contain a speech to text transcription provided by Google Cloud Speech API
#
# License: There are no explicit license terms on the original script or on the blog post with modifications
# I'm assumig GNU/GPL2+ unless notified otherwise by copyright holder(s)
#
# Version History:
# 2021-05-06 Add fix by dcat127: trim flac file to 59 seconds
# 2020-08-27 Add fix by chrisduncansn
# Minor edit in instruction wording
# 2020-05-27 Add instructions from sr10952
# Add export fix by levishores
# 2019-02-27 Initial commit by tony722
# set PATH
PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
# save the current directory
pushd .
# create a temporary directory and cd to it
TMPDIR=$(mktemp -d)
cd $TMPDIR
# dump the stream to a temporary file
cat >> stream.org
# get the boundary
BOUNDARY=$(grep "boundary=" stream.org | cut -d'"' -f 2)
# if mail has no boundaries, assume no attachment
if [ "$BOUNDARY" = "" ]
then
# send the original stream
mv stream.org stream.new
else
# cut the original stream into parts
# stream.part - header before the boundary
# stream.part1 - header after the bounday
# stream.part2 - body of the message
# stream.part3 - attachment in base64 (WAV file)
# stream.part4 - footer of the message
awk '/'$BOUNDARY'/{i++}{print > "stream.part"i}' stream.org
# cut the attachment into parts
# stream.part3.head - header of attachment
# stream.part3.wav.base64 - wav file of attachment (encoded base64)
sed '7,$d' stream.part3 > stream.part3.wav.head
sed '1,6d' stream.part3 > stream.part3.wav.base64
# convert the base64 file to a wav file
dos2unix -o stream.part3.wav.base64
base64 -di stream.part3.wav.base64 > stream.part3.wav
# convert the wav file to FLAC
sox -G stream.part3.wav --channels=1 --bits=16 --rate=8000 stream.part3.flac trim 0 59
# convert to MP3
sox stream.part3.wav stream.part3-pcm.wav
lame -m m -b 24 stream.part3-pcm.wav stream.part3.mp3
base64 stream.part3.mp3 > stream.part3.mp3.base64
# create mp3 mail part
sed 's/x-[wW][aA][vV]/mpeg/g' stream.part3.wav.head | sed 's/.[wW][aA][vV]/.mp3/g' > stream.part3.new
dos2unix -o stream.part3.new
unix2dos -o stream.part3.mp3.base64
cat stream.part3.mp3.base64 >> stream.part3.new
# save voicemail in tmp folder in case of trouble
# TMPMP3=$(mktemp -u /tmp/msg_XXXXXXXX.mp3)
# cp "stream.part3.mp3" "$TMPMP3"
export CLOUDSDK_CONFIG=/home/asterisk/.config/gcloud
RESULT=`gcloud ml speech recognize stream.part3.flac --language-code='en-US'`
FILTERED=`echo "$RESULT" | jq -r '.results[].alternatives[].transcript'`
# generate first part of mail body, converting it to LF only
mv stream.part stream.new
cat stream.part1 >> stream.new
sed '$d' < stream.part2 >> stream.new
# beginning of transcription section
echo "" >> stream.new
echo "--- Google transcription result ---" >> stream.new
# append result of transcription
if [ -z "$FILTERED" ]
then
echo "(Google was unable to recognize any speech in audio data.)" >> stream.new
else
echo "$FILTERED" >> stream.new
fi
# end of message body
tail -1 stream.part2 >> stream.new
# add converted attachment
cat stream.part3.new >> stream.new
# append end of mail body, converting it to LF only
echo "" >> stream.tmp
echo "" >> stream.tmp
cat stream.part4 >> stream.tmp
dos2unix -o stream.tmp
cat stream.tmp >> stream.new
fi
# send the mail thru sendmail
cat stream.new | sendmail -t
# go back to original directory
popd
# remove all temporary files and temporary directory
rm -Rf $TMPDIR
@Acpek23
Copy link

Acpek23 commented Sep 23, 2020

now im getting this error: ERROR: (gcloud.ml.speech.recognize) Invalid audio source [stream.part3.flac]. The source must either be a local path or a Google Cloud Storage URL (such as gs://bucket/object).

Any suggestion?

@Acpek23
Copy link

Acpek23 commented Sep 23, 2020

RESULT=gcloud ml speech recognize stream.part3.flac --language-code='en-US'

youre right and im getting this:
[asterisk@pbx sbin]$ RESULT= gcloud ml speech recognize stream.part3.flac --language-code='en-US'
ERROR: (gcloud.ml.speech.recognize) Invalid audio source [stream.part3.flac]. The source must either be a local path or a Google Cloud Storage URL (such as gs://bucket/object).
[asterisk@pbx sbin]$

@chrisduncansn
Copy link

chrisduncansn commented Sep 23, 2020

//comment this line at the bottom to keep the TMP directory for analysis after the script runs
rm -Rf $TMPDIR

run the script again, then cd in to the temp directory and re-run

RESULT=gcloud ml speech recognize stream.part3.flac --language-code='en-US'
echo $RESULT

@Acpek23
Copy link

Acpek23 commented Sep 23, 2020

same result
[asterisk@pbx tmp]$ cd tmp.t6QQWwfhbN/
[asterisk@pbx tmp.t6QQWwfhbN]$ ls
stream.new
[asterisk@pbx tmp.t6QQWwfhbN]$ RESULT= gcloud ml speech recognize stream.part3.flac --language-code='en-US'
ERROR: (gcloud.ml.speech.recognize) Invalid audio source [stream.part3.flac]. The source must either be a local path or a Google Cloud Storage URL (such as gs://bucket/object).
[asterisk@pbx tmp.t6QQWwfhbN]$ echo $RESULT

[asterisk@pbx tmp.t6QQWwfhbN]$

on stream.new im able to see the "normal message" this is the one that im currently sending.
or do i need to disabled this on the pbx configuration?
i have this config: Mail Command : /usr/sbin/sendmail-gcloud

Alejandro Cardenas,

Hay un nuevo correo de voz en el buzón ext:

De:	"Gustavo Martinez" <ext>
Duración del mensaje:	0:19 seconds
Fecha:	Wednesday, September 23, 2020 at 05:24:35 PM

Marca *98 para acceder a su correo de voz por teléfono.
Ingresa a url para revisar su correo de voz con un navegador web.

@skippy1976
Copy link

skippy1976 commented Oct 14, 2020

You may wish to consider using the phone_call model. This will improve the transcription.

@pete1019
Copy link

pete1019 commented Oct 14, 2020

You may wish to consider using the phone_call model. This will improve the transcription.

Could you please be more specific? Example what to change? Thanks

@kevinrossen
Copy link

kevinrossen commented Oct 28, 2020

You may wish to consider using the phone_call model. This will improve the transcription.

Could you please be more specific? Example what to change? Thanks

Looks like skippy1976 is referring to the speech model options available in Speech-to-Text. But I don't see an option to set a model using gcloud from the terminal.
Here's model documentation
Here's the gcloud documentation

@kevinrossen
Copy link

kevinrossen commented Oct 28, 2020

Looks like there's a 60 second limit for the transcriptions using "gcloud ml speech recognize". But there would be no limit to the length using "gcloud ml speech recognize-long-running". I know the length of the message is stored somewhere as that ends up in the body of the email. Anyone have any ideas on how to modify this to use an "if then" option for longer voicemails?

@kevinrossen
Copy link

kevinrossen commented Oct 28, 2020

I found a couple options that I like while digging into the documentation. The options I like are on the alpha channel, so there's a good chance they won't work long-term, but I'm okay with that on my setup. Here's what I changed:

ORIGINAL: RESULT=`gcloud ml speech recognize stream.part3.flac --language-code='en-US'\`

NEW: RESULT=`gcloud alpha ml speech recognize stream.part3.flac --language-code='en-US' --interaction-type='voicemail' --include-word-time-offsets --filter-profanity --enable-automatic-punctuation`

@CadillacRick
Copy link

CadillacRick commented Oct 31, 2020

Hey guys, new to this topic... Trying to get this to work on my Asterisk box... follow all the steps as indicated. Didn't get any errors along the way, but I don't seem to get any results... The voicemail still answers, records the file... and I still get the audio file to my email.. but at the bottom I see

--Google transcription result --
(Google was unable to recognize any speech in audio data.)

Also noticed that I can't play the MP3 file attached with the email.... says it's unsupported or corrupt.

Did some more testing... when I leave a voicemail... and I go into the /tmp/tmp.xxxxxxx folder... I can run the command manually

RESULT= `gcloud ml speech recognize stream.part3.flac --language-code='en-US'`

and with echo $RESULT I get the transcription like so...

{  "results": [  {  "alternatives": [  {  "confidence": 0.7456564, "transcript": "This is yet another test of the voicemail system Richard testing Richard testing."  } ] } ] }

But still unable to get it in the email from Asterisk....

In the error file in the /tmp/tmp.xxxxxx I see this --

ERROR: (gcloud.ml.speech.recognize) Your current active account [xxxxxxxxxxx@gmail.com] does not have any valid credentials
Please run:

$ gcloud auth login

to obtain new credentials.

For service account, please activate it first:

$ gcloud auth activate-service-account ACCOUNT

Which is weird because the command runs manually....

Thanks!

Richard

@CadillacRick
Copy link

CadillacRick commented Oct 31, 2020

Ok, well it turns out that this line --- export CLOUDSDK_CONFIG=/home/asterisk/.config/gcloud

was a problem for my setup... Now I get the transcription....

But!!!!! the audio file is still a problem. the mp3 file doesn't work... can't listen to it..

@tony722 Any ideas ??? Anyone ???

@kevinrossen have the MP3 attachments been working for you ?

@dcat127
Copy link

dcat127 commented Jan 26, 2021

This script fails for any voicemail longer than 1 minute, with the following error:
ERROR: (gcloud.ml.speech.recognize) INVALID_ARGUMENT: Sync input too long. For audio longer than 1 min use LongRunningRecognize with a 'uri' parameter.

I have fixed it by replacing
sox -G stream.part3.wav --channels=1 --bits=16 --rate=8000 stream.part3.flac

with
sox -G stream.part3.wav --channels=1 --bits=16 --rate=8000 stream.part3.flac trim 0 59

this does not "fix" the issue of too long voicemails, but it changes it so it only transcribes the first 59 seconds, which in my case is good enough.

@chrisduncansn
Copy link

chrisduncansn commented Feb 9, 2021

I found a couple options that I like while digging into the documentation. The options I like are on the alpha channel, so there's a good chance they won't work long-term, but I'm okay with that on my setup. Here's what I changed:

ORIGINAL: RESULT=`gcloud ml speech recognize stream.part3.flac --language-code='en-US'\`

NEW: RESULT=`gcloud alpha ml speech recognize stream.part3.flac --language-code='en-US' --interaction-type='voicemail' --include-word-time-offsets --filter-profanity --enable-automatic-punctuation`

@kevinrossen how are those alpha options working for you? I checked and it looks like they are still in Alpha status, which is a bummer.

@msc1
Copy link

msc1 commented May 6, 2021

was a problem for my setup... Now I get the transcription....

@CadillacRick did you run it as asterisk or root?

@rr10
Copy link

rr10 commented Jan 13, 2022

Thanks @tony722 @chrisduncansn I preferred to modify the APIs used to support multiple languages ​​and punctuation. Now I no longer have to worry about the caller's language (at least up to three additional languages).
I have noticed that if the sentence starts with a word in a different language from the rest of the message there can be problems. It might be interesting to use different APIs for multiple languages ​​in succession.

https://gist.github.com/rr10/472f88b41d7383ba6e04f982c0f8a7c2
RESULT=gcloud alpha ml speech recognize-long-running stream.part3.flac --language-code='ro-RO' --additional-language-codes='it-IT','en-US' --enable-automatic-punctuation --interaction-type=voicemail

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment