Skip to content

Instantly share code, notes, and snippets.

@tony722
Last active February 25, 2024 19:09
  • Star 16 You must be signed in to star a gist
  • Fork 8 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save tony722/7c6d86be2e74fa10a1f344a4c2b093ea to your computer and use it in GitHub Desktop.
Freepbx Voicemail Transcription Script: Google Speech API
#!/bin/sh
# sendmail-gcloud
#
# Installation instructions
# Copy the content of this file to /usr/sbin/sendmail-gcloud
#
# Google Account
# ---------------
# Create a Google Cloud account if you don't have one yet. Free trial is available at https://console.cloud.google.com/freetrial
# Within console.cloud.google.com search for Cloud Speech-to-Text API and enable it
# Some users report you need to have configured a service account: See creating a service account in Google Cloud. https://cloud.google.com/iam/docs/keys-create-delete#creating
#
# From the Linux command line on the FreePBX machine
# -------------------------------------------
# Follow steps 1 and 3 of the instructions on Google Cloud https://cloud.google.com/sdk/docs/downloads-yum
# Step 1 Note: since FreePBX is Centos 7, follow the instructions to replace el8 with el7 in the base url:
# sudo tee -a /etc/yum.repos.d/google-cloud-sdk.repo << EOM [google-cloud-cli] name=Google Cloud CLI baseurl=https://packages.cloud.google.com/yum/repos/cloud-sdk-e17-x86_64 enabled=1 gpgcheck=1 repo_gpgcheck=0 gpgkey=https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg EOM
#
# Step 3 Note: use yum instead of dnf:
# yum install google-cloud-cli
#
# Run the following commands on FreePBX;
# cd /usr/sbin/
# chown asterisk:asterisk sendmail-gcloud
# chmod 744 sendmail-gcloud
# chmod 744 /usr/bin/dos2unix
#
# Verify that you have the following (by simply running the command) and if not use yum install;
# jq
# sox
# flac
# dos2unix -V
# Ensure dos2unix is executable by the asterisk user (chmod 777 /usr/bin/dos2unix)
#
# Connect FreePBX to Google Cloud
# su asterisk
# gcloud auth login
# CLI will provide you a url. Copy that and paste it into your browser. Google will give you a verification code to copy.
# Paste it into the cli waiting for a verification code.
#
# Some users report that you need to run the following at this point:
# gcloud config set project "Your Project ID"
#
# Open FreePBX web interface
# Go to Settings > Voicemail Admin > Settings > Email Config
# Change Mail Command to: /usr/sbin/sendmail-gcloud
# Submit and apply changes
#
# Original source created by N. Bernaerts: https://github.com/NicolasBernaerts/debian-scripts/tree/master/asterisk
# modified per: https://jrklein.com/2015/08/17/asterisk-voicemail-transcription-via-ibm-bluemix-speech-to-text-api/
# modified per: https://gist.github.com/lgaetz/2cd9c54fb1714e0d509f5f8215b3f5e6
# current version: https://gist.github.com/tony722/7c6d86be2e74fa10a1f344a4c2b093ea
#
# Notes: This is a script modified from the original to work with FreePBX so that email notifications sent from
# Asterisk voicemail contain a speech to text transcription provided by Google Cloud Speech API
#
# License: There are no explicit license terms on the original script or on the blog post with modifications
# I'm assumig GNU/GPL2+ unless notified otherwise by copyright holder(s)
#
# Version History:
# 2023-12-11 Add gcloud cli parameters by grintor to enhance gcloud ml telephony transcription
# 2023-09-01 Update instructions for installing google-cloud-cli
# 2023-08-24 Add fix by EagleTalonSystems: gcloud config set project "Your Project ID"
# 2021-05-06 Add fix by dcat127: trim flac file to 59 seconds
# 2020-08-27 Add fix by chrisduncansn
# Minor edit in instruction wording
# 2020-05-27 Add instructions from sr10952
# Add export fix by levishores
# 2019-02-27 Initial commit by tony722
# set PATH
PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
# save the current directory
pushd .
# create a temporary directory and cd to it
TMPDIR=$(mktemp -d)
cd $TMPDIR
# dump the stream to a temporary file
cat >> stream.org
# get the boundary
BOUNDARY=$(grep "boundary=" stream.org | cut -d'"' -f 2)
# if mail has no boundaries, assume no attachment
if [ "$BOUNDARY" = "" ]
then
# send the original stream
mv stream.org stream.new
else
# cut the original stream into parts
# stream.part - header before the boundary
# stream.part1 - header after the bounday
# stream.part2 - body of the message
# stream.part3 - attachment in base64 (WAV file)
# stream.part4 - footer of the message
awk '/'$BOUNDARY'/{i++}{print > "stream.part"i}' stream.org
# cut the attachment into parts
# stream.part3.head - header of attachment
# stream.part3.wav.base64 - wav file of attachment (encoded base64)
sed '7,$d' stream.part3 > stream.part3.wav.head
sed '1,6d' stream.part3 > stream.part3.wav.base64
# convert the base64 file to a wav file
dos2unix -o stream.part3.wav.base64
base64 -di stream.part3.wav.base64 > stream.part3.wav
# convert the wav file to FLAC
sox -G stream.part3.wav --channels=1 --bits=16 --rate=8000 stream.part3.flac trim 0 59
# convert to MP3
sox stream.part3.wav stream.part3-pcm.wav
lame -m m -b 24 stream.part3-pcm.wav stream.part3.mp3
base64 stream.part3.mp3 > stream.part3.mp3.base64
# create mp3 mail part
sed 's/x-[wW][aA][vV]/mpeg/g' stream.part3.wav.head | sed 's/.[wW][aA][vV]/.mp3/g' > stream.part3.new
dos2unix -o stream.part3.new
unix2dos -o stream.part3.mp3.base64
cat stream.part3.mp3.base64 >> stream.part3.new
# save voicemail in tmp folder in case of trouble
# TMPMP3=$(mktemp -u /tmp/msg_XXXXXXXX.mp3)
# cp "stream.part3.mp3" "$TMPMP3"
export CLOUDSDK_CONFIG=/home/asterisk/.config/gcloud
RESULT=`gcloud ml speech recognize stream.part3.flac --language-code='en-US' --model=phone_call --filter-profanity --enable-automatic-punctuation`
FILTERED=`echo "$RESULT" | jq -r '.results[].alternatives[].transcript'`
# generate first part of mail body, converting it to LF only
mv stream.part stream.new
cat stream.part1 >> stream.new
sed '$d' < stream.part2 >> stream.new
# beginning of transcription section
echo "" >> stream.new
echo "--- Google transcription result ---" >> stream.new
# append result of transcription
if [ -z "$FILTERED" ]
then
echo "(Google was unable to recognize any speech in audio data.)" >> stream.new
else
echo "$FILTERED" >> stream.new
fi
# end of message body
tail -1 stream.part2 >> stream.new
# add converted attachment
cat stream.part3.new >> stream.new
# append end of mail body, converting it to LF only
echo "" >> stream.tmp
echo "" >> stream.tmp
cat stream.part4 >> stream.tmp
dos2unix -o stream.tmp
cat stream.tmp >> stream.new
fi
# send the mail thru sendmail
cat stream.new | sendmail -t
# go back to original directory
popd
# remove all temporary files and temporary directory
rm -Rf $TMPDIR
@CadillacRick
Copy link

CadillacRick commented Oct 31, 2020

Ok, well it turns out that this line --- export CLOUDSDK_CONFIG=/home/asterisk/.config/gcloud

was a problem for my setup... Now I get the transcription....

But!!!!! the audio file is still a problem. the mp3 file doesn't work... can't listen to it..

@tony722 Any ideas ??? Anyone ???

@kevinrossen have the MP3 attachments been working for you ?

@dcat127
Copy link

dcat127 commented Jan 26, 2021

This script fails for any voicemail longer than 1 minute, with the following error:
ERROR: (gcloud.ml.speech.recognize) INVALID_ARGUMENT: Sync input too long. For audio longer than 1 min use LongRunningRecognize with a 'uri' parameter.

I have fixed it by replacing
sox -G stream.part3.wav --channels=1 --bits=16 --rate=8000 stream.part3.flac

with
sox -G stream.part3.wav --channels=1 --bits=16 --rate=8000 stream.part3.flac trim 0 59

this does not "fix" the issue of too long voicemails, but it changes it so it only transcribes the first 59 seconds, which in my case is good enough.

@chrisduncansn
Copy link

I found a couple options that I like while digging into the documentation. The options I like are on the alpha channel, so there's a good chance they won't work long-term, but I'm okay with that on my setup. Here's what I changed:

ORIGINAL: RESULT=`gcloud ml speech recognize stream.part3.flac --language-code='en-US'\`

NEW: RESULT=`gcloud alpha ml speech recognize stream.part3.flac --language-code='en-US' --interaction-type='voicemail' --include-word-time-offsets --filter-profanity --enable-automatic-punctuation`

@kevinrossen how are those alpha options working for you? I checked and it looks like they are still in Alpha status, which is a bummer.

@Identity9165
Copy link

Identity9165 commented May 6, 2021

was a problem for my setup... Now I get the transcription....

@CadillacRick did you run it as asterisk or root?

@rr10
Copy link

rr10 commented Jan 13, 2022

Thanks @tony722 @chrisduncansn I preferred to modify the APIs used to support multiple languages ​​and punctuation. Now I no longer have to worry about the caller's language (at least up to three additional languages).
I have noticed that if the sentence starts with a word in a different language from the rest of the message there can be problems. It might be interesting to use different APIs for multiple languages ​​in succession.

https://gist.github.com/rr10/472f88b41d7383ba6e04f982c0f8a7c2
RESULT=gcloud alpha ml speech recognize-long-running stream.part3.flac --language-code='ro-RO' --additional-language-codes='it-IT','en-US' --enable-automatic-punctuation --interaction-type=voicemail

@irn-eric
Copy link

irn-eric commented Dec 8, 2022

I'm a noob with this gcloud, for me the fix was to su -asterisk gcloud auth login and then do the same for set the project id, I'm sure there was a more clever way it didn't put the json key in the asterisk directory until I did it that way and when i copied it permissions were a mess (for me anyway)

@omegahacker
Copy link

In no circumstance should anybody ever chmod 777 /usr/bin/dos2unix. The dos2unix command as installed by the operating system correctly set to 744, and will thus be executable for all users.

Adding write bits for "everyone" to this program means that if an attacker gains even the lowest-privilege account on the machine, they can completely replace the contents of dos2unix. If at any point in the future that program is run by 'root' to perform any action at all, the attacker now owns your entire machine. I've used this technique myself several times when doing embedded security assessments.

@levishores
Copy link

I just had to rebuild our PBX and re-did this install - this was the one part I wondered about. I just chmod 744'd it back to as it was. Thank you for coming here and saying this.

@BryanKoehn
Copy link

BryanKoehn commented Apr 9, 2023

Thanks for the awesome script, I did have to troubleshoot it a little. It appears this now needs to be done with a service account.

@pete1019
Copy link

pete1019 commented Apr 9, 2023

Thanks for the awesome script, I did have to troubleshoot it a little. It appears this now needs to be done with a service account.

Can you please specify the need of a "service account"?
Thanks

@BryanKoehn
Copy link

BryanKoehn commented Apr 10, 2023

Thanks for the awesome script, I did have to troubleshoot it a little. It appears this now needs to be done with a service account.

Can you please specify the need of a "service account"? Thanks

When using 'gcloud auth login' and using my credentials it would fail because the speech to text wants you to use a service account. Which can be done with the following.

gcloud auth activate-service-account "ServiceAccountName"@cobalt-bliss-383201.iam.gserviceaccount.com --key-file=./"KeyFileName".json

See creating a service account in Google Cloud.
https://cloud.google.com/iam/docs/keys-create-delete#creating

@EagleTalonSystems
Copy link

EagleTalonSystems commented Aug 24, 2023

Missing a step,

After you:
"gcloud auth login
CLI will provide you a url. Copy that and paste it into your browser. Google will give you a verification code to copy. Paste it"

You will also need to run:

gcloud config set project "Your Project ID"

Else you will get this error:
--- Google transcription result ---
(Google was unable to recognize any speech in audio data.)

@tony722
Copy link
Author

tony722 commented Aug 24, 2023

@EagleTalonSystems wrote
You will also need to run:
gcloud config set project "Your Project ID"

Thanks, added that. --Tony

@BryanKoehn
Copy link

BryanKoehn commented Aug 25, 2023 via email

@EagleTalonSystems
Copy link

EagleTalonSystems commented Aug 29, 2023

@tony722 Thanks for adding that.

One other thing that might confuse people, google might of changed either webpage but for:

"# Follow steps 1 and 2 of the instructions on Google Cloud https://cloud.google.com/sdk/docs/downloads-yum"

It should be 1 and 3, since the FreePBX distro is Centos version 7.

Step 1
'
sudo tee -a /etc/yum.repos.d/google-cloud-sdk.repo << EOM
[google-cloud-cli]
name=Google Cloud CLI
baseurl=https://packages.cloud.google.com/yum/repos/cloud-sdk-el8-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=0
gpgkey=https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
EOM
'

Step 3
yum install google-cloud-cli

Everything else is perfect

@xptpa2020
Copy link

I see that Google transcribe can add punctuation. Can the script the modified to pass that variable? https://cloud.google.com/speech-to-text/docs/automatic-punctuation

@meretrout
Copy link

meretrout commented Oct 27, 2023

I just installed this and it works a treat. A genius idea with very clear instructions. Thank you.

EDIT: changing the code line 130 as below improved the transcript quality:

FROM:
gcloud ml speech recognize stream.part3.flac --language-code='en-US'
TO:
gcloud alpha ml speech recognize-long-running stream.part3.flac --language-code='en-US' --enable-automatic-punctuation --interaction-type='voicemail' --model='phone_call_enhanced'

@grintor
Copy link

grintor commented Dec 7, 2023

For better dictation, you should change the command from
gcloud ml speech recognize stream.part3.flac --language-code='en-US'
to
gcloud ml speech recognize stream.part3.flac --language-code='en-US' --enable-automatic-punctuation --model=phone_call_enhanced

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment