Skip to content

Instantly share code, notes, and snippets.

@tony722
Last active February 25, 2024 19:09
Show Gist options
  • Star 16 You must be signed in to star a gist
  • Fork 8 You must be signed in to fork a gist
  • Save tony722/7c6d86be2e74fa10a1f344a4c2b093ea to your computer and use it in GitHub Desktop.
Save tony722/7c6d86be2e74fa10a1f344a4c2b093ea to your computer and use it in GitHub Desktop.
Freepbx Voicemail Transcription Script: Google Speech API
#!/bin/sh
# sendmail-gcloud
#
# Installation instructions
# Copy the content of this file to /usr/sbin/sendmail-gcloud
#
# Google Account
# ---------------
# Create a Google Cloud account if you don't have one yet. Free trial is available at https://console.cloud.google.com/freetrial
# Within console.cloud.google.com search for Cloud Speech-to-Text API and enable it
# Some users report you need to have configured a service account: See creating a service account in Google Cloud. https://cloud.google.com/iam/docs/keys-create-delete#creating
#
# From the Linux command line on the FreePBX machine
# -------------------------------------------
# Follow steps 1 and 3 of the instructions on Google Cloud https://cloud.google.com/sdk/docs/downloads-yum
# Step 1 Note: since FreePBX is Centos 7, follow the instructions to replace el8 with el7 in the base url:
# sudo tee -a /etc/yum.repos.d/google-cloud-sdk.repo << EOM [google-cloud-cli] name=Google Cloud CLI baseurl=https://packages.cloud.google.com/yum/repos/cloud-sdk-e17-x86_64 enabled=1 gpgcheck=1 repo_gpgcheck=0 gpgkey=https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg EOM
#
# Step 3 Note: use yum instead of dnf:
# yum install google-cloud-cli
#
# Run the following commands on FreePBX;
# cd /usr/sbin/
# chown asterisk:asterisk sendmail-gcloud
# chmod 744 sendmail-gcloud
# chmod 744 /usr/bin/dos2unix
#
# Verify that you have the following (by simply running the command) and if not use yum install;
# jq
# sox
# flac
# dos2unix -V
# Ensure dos2unix is executable by the asterisk user (chmod 777 /usr/bin/dos2unix)
#
# Connect FreePBX to Google Cloud
# su asterisk
# gcloud auth login
# CLI will provide you a url. Copy that and paste it into your browser. Google will give you a verification code to copy.
# Paste it into the cli waiting for a verification code.
#
# Some users report that you need to run the following at this point:
# gcloud config set project "Your Project ID"
#
# Open FreePBX web interface
# Go to Settings > Voicemail Admin > Settings > Email Config
# Change Mail Command to: /usr/sbin/sendmail-gcloud
# Submit and apply changes
#
# Original source created by N. Bernaerts: https://github.com/NicolasBernaerts/debian-scripts/tree/master/asterisk
# modified per: https://jrklein.com/2015/08/17/asterisk-voicemail-transcription-via-ibm-bluemix-speech-to-text-api/
# modified per: https://gist.github.com/lgaetz/2cd9c54fb1714e0d509f5f8215b3f5e6
# current version: https://gist.github.com/tony722/7c6d86be2e74fa10a1f344a4c2b093ea
#
# Notes: This is a script modified from the original to work with FreePBX so that email notifications sent from
# Asterisk voicemail contain a speech to text transcription provided by Google Cloud Speech API
#
# License: There are no explicit license terms on the original script or on the blog post with modifications
# I'm assumig GNU/GPL2+ unless notified otherwise by copyright holder(s)
#
# Version History:
# 2023-12-11 Add gcloud cli parameters by grintor to enhance gcloud ml telephony transcription
# 2023-09-01 Update instructions for installing google-cloud-cli
# 2023-08-24 Add fix by EagleTalonSystems: gcloud config set project "Your Project ID"
# 2021-05-06 Add fix by dcat127: trim flac file to 59 seconds
# 2020-08-27 Add fix by chrisduncansn
# Minor edit in instruction wording
# 2020-05-27 Add instructions from sr10952
# Add export fix by levishores
# 2019-02-27 Initial commit by tony722
# set PATH
PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
# save the current directory
pushd .
# create a temporary directory and cd to it
TMPDIR=$(mktemp -d)
cd $TMPDIR
# dump the stream to a temporary file
cat >> stream.org
# get the boundary
BOUNDARY=$(grep "boundary=" stream.org | cut -d'"' -f 2)
# if mail has no boundaries, assume no attachment
if [ "$BOUNDARY" = "" ]
then
# send the original stream
mv stream.org stream.new
else
# cut the original stream into parts
# stream.part - header before the boundary
# stream.part1 - header after the bounday
# stream.part2 - body of the message
# stream.part3 - attachment in base64 (WAV file)
# stream.part4 - footer of the message
awk '/'$BOUNDARY'/{i++}{print > "stream.part"i}' stream.org
# cut the attachment into parts
# stream.part3.head - header of attachment
# stream.part3.wav.base64 - wav file of attachment (encoded base64)
sed '7,$d' stream.part3 > stream.part3.wav.head
sed '1,6d' stream.part3 > stream.part3.wav.base64
# convert the base64 file to a wav file
dos2unix -o stream.part3.wav.base64
base64 -di stream.part3.wav.base64 > stream.part3.wav
# convert the wav file to FLAC
sox -G stream.part3.wav --channels=1 --bits=16 --rate=8000 stream.part3.flac trim 0 59
# convert to MP3
sox stream.part3.wav stream.part3-pcm.wav
lame -m m -b 24 stream.part3-pcm.wav stream.part3.mp3
base64 stream.part3.mp3 > stream.part3.mp3.base64
# create mp3 mail part
sed 's/x-[wW][aA][vV]/mpeg/g' stream.part3.wav.head | sed 's/.[wW][aA][vV]/.mp3/g' > stream.part3.new
dos2unix -o stream.part3.new
unix2dos -o stream.part3.mp3.base64
cat stream.part3.mp3.base64 >> stream.part3.new
# save voicemail in tmp folder in case of trouble
# TMPMP3=$(mktemp -u /tmp/msg_XXXXXXXX.mp3)
# cp "stream.part3.mp3" "$TMPMP3"
export CLOUDSDK_CONFIG=/home/asterisk/.config/gcloud
RESULT=`gcloud ml speech recognize stream.part3.flac --language-code='en-US' --model=phone_call --filter-profanity --enable-automatic-punctuation`
FILTERED=`echo "$RESULT" | jq -r '.results[].alternatives[].transcript'`
# generate first part of mail body, converting it to LF only
mv stream.part stream.new
cat stream.part1 >> stream.new
sed '$d' < stream.part2 >> stream.new
# beginning of transcription section
echo "" >> stream.new
echo "--- Google transcription result ---" >> stream.new
# append result of transcription
if [ -z "$FILTERED" ]
then
echo "(Google was unable to recognize any speech in audio data.)" >> stream.new
else
echo "$FILTERED" >> stream.new
fi
# end of message body
tail -1 stream.part2 >> stream.new
# add converted attachment
cat stream.part3.new >> stream.new
# append end of mail body, converting it to LF only
echo "" >> stream.tmp
echo "" >> stream.tmp
cat stream.part4 >> stream.tmp
dos2unix -o stream.tmp
cat stream.tmp >> stream.new
fi
# send the mail thru sendmail
cat stream.new | sendmail -t
# go back to original directory
popd
# remove all temporary files and temporary directory
rm -Rf $TMPDIR
@BryanKoehn
Copy link

BryanKoehn commented Apr 9, 2023

Thanks for the awesome script, I did have to troubleshoot it a little. It appears this now needs to be done with a service account.

@pete1019
Copy link

pete1019 commented Apr 9, 2023

Thanks for the awesome script, I did have to troubleshoot it a little. It appears this now needs to be done with a service account.

Can you please specify the need of a "service account"?
Thanks

@BryanKoehn
Copy link

BryanKoehn commented Apr 10, 2023

Thanks for the awesome script, I did have to troubleshoot it a little. It appears this now needs to be done with a service account.

Can you please specify the need of a "service account"? Thanks

When using 'gcloud auth login' and using my credentials it would fail because the speech to text wants you to use a service account. Which can be done with the following.

gcloud auth activate-service-account "ServiceAccountName"@cobalt-bliss-383201.iam.gserviceaccount.com --key-file=./"KeyFileName".json

See creating a service account in Google Cloud.
https://cloud.google.com/iam/docs/keys-create-delete#creating

@EagleTalonSystems
Copy link

EagleTalonSystems commented Aug 24, 2023

Missing a step,

After you:
"gcloud auth login
CLI will provide you a url. Copy that and paste it into your browser. Google will give you a verification code to copy. Paste it"

You will also need to run:

gcloud config set project "Your Project ID"

Else you will get this error:
--- Google transcription result ---
(Google was unable to recognize any speech in audio data.)

@tony722
Copy link
Author

tony722 commented Aug 24, 2023

@EagleTalonSystems wrote
You will also need to run:
gcloud config set project "Your Project ID"

Thanks, added that. --Tony

@BryanKoehn
Copy link

BryanKoehn commented Aug 25, 2023 via email

@EagleTalonSystems
Copy link

EagleTalonSystems commented Aug 29, 2023

@tony722 Thanks for adding that.

One other thing that might confuse people, google might of changed either webpage but for:

"# Follow steps 1 and 2 of the instructions on Google Cloud https://cloud.google.com/sdk/docs/downloads-yum"

It should be 1 and 3, since the FreePBX distro is Centos version 7.

Step 1
'
sudo tee -a /etc/yum.repos.d/google-cloud-sdk.repo << EOM
[google-cloud-cli]
name=Google Cloud CLI
baseurl=https://packages.cloud.google.com/yum/repos/cloud-sdk-el8-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=0
gpgkey=https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
EOM
'

Step 3
yum install google-cloud-cli

Everything else is perfect

@xptpa2020
Copy link

I see that Google transcribe can add punctuation. Can the script the modified to pass that variable? https://cloud.google.com/speech-to-text/docs/automatic-punctuation

@meretrout
Copy link

meretrout commented Oct 27, 2023

I just installed this and it works a treat. A genius idea with very clear instructions. Thank you.

EDIT: changing the code line 130 as below improved the transcript quality:

FROM:
gcloud ml speech recognize stream.part3.flac --language-code='en-US'
TO:
gcloud alpha ml speech recognize-long-running stream.part3.flac --language-code='en-US' --enable-automatic-punctuation --interaction-type='voicemail' --model='phone_call_enhanced'

@grintor
Copy link

grintor commented Dec 7, 2023

For better dictation, you should change the command from
gcloud ml speech recognize stream.part3.flac --language-code='en-US'
to
gcloud ml speech recognize stream.part3.flac --language-code='en-US' --enable-automatic-punctuation --model=phone_call_enhanced

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment