Skip to content

Instantly share code, notes, and snippets.

@Psychokiller1888
Last active June 7, 2023 11:21
Show Gist options
  • Save Psychokiller1888/cf10af3220b5cd6d9c92c709c6af92c2 to your computer and use it in GitHub Desktop.
Save Psychokiller1888/cf10af3220b5cd6d9c92c709c6af92c2 to your computer and use it in GitHub Desktop.
One TTS to rule them all
#!/usr/bin/env bash
# By Psycho
# Shell script to handle different TTS and online / offline connectivity
# This bash script can be set as a custom TTS for snips but also called directly from your skills
# a great way to give more than one personality to your assistant
# Original script: https://gist.github.com/Psychokiller1888/cf10af3220b5cd6d9c92c709c6af92c2
####### COMMON #######
#------------------------------------
# Set your cache path
cache="/home/pi/snipsSuperTTS/cache"
#------------------------------------
# - Install mpg123
# - The cache path is created by the script itself at first run! If you already have the cache directories, make sure to set its owner to "_snips" => sudo chown _snips /home/pi/snipsSuperTTS/cache
# - Edit /etc/snips.toml
# - Set "customtts" as snips-tts provider
# - Add as customtts: command = ["/home/pi/snipsSuperTTS/snipsSuperTTS.sh", "%%OUTPUT_FILE%%", "google", "%%LANG%%", "US", "Wavenet-C", "FEMALE", "%%TEXT%%", "22050"]
# Change "US" to another language country code, "GB" per exemple for a british voice
# You can customize the "Wavenet-C" to another voice of your choice. https://cloud.google.com/text-to-speech/docs/voices / https://docs.aws.amazon.com/polly/latest/dg/voicelist.html
# Offline voices possibilities are: picotts or mycroft
# Fit "FEMALE" to the voice gender you want. Note this is linked to google voices
# You can change the sample rate, the last argument, to your needs
# Restart snips: systemctl restart snips-*
# If you want a total customized experience, you can easily enable both google and amazon and use a different one depending on what you want. Don't be shy, there's no limit
# Note that not all parameters do something depending on the voice you choose. Those are marked with "--" in the provided examples
####### MycroftAI - Mimic #######
# https://github.com/MycroftAI/mimic
# This one is pretty long to install, but hey, the quality compared to pico is worth it!
# Set the following option to True to use Mycroft instead of pico. Don't forget to set the path too
#
# Example: command = ["/home/pi/snipsSuperTTS/snipsSuperTTS.sh", "%%OUTPUT_FILE%%", "mycroft", "%%LANG%%", "--", "slt_hts", "--", "%%TEXT%%", "22050"]
#
# Available voices: aew ahw aup awb axb bdl clb eey fem gka jmk ksp ljm rms rxr slt slt_hts
# Or you can use an external voice on http.
#
# sudo apt-get install gcc make pkg-config automake libtool libasound2-dev
# git clone https://github.com/MycroftAI/mimic.git
# cd mimic
# ./dependencies.sh --prefix="/usr/local"
# ./autogen.sh
# ./configure --prefix="/usr/local"
# make
# sudo /sbin/ldconfig
# make check
useMycroft=false
mycroftPath="/home/pi/mimic"
####### GOOGLE #######
# Install Google SDK: https://cloud.google.com/text-to-speech/docs/quickstart-protocol
# Follow point 6. to initialize the sdk after creating your service account
# Get your api key from the console https://console.developers.google.com
# Uncomment the following and set it accordingly:
#
#googleWavenetAPIKey=""
#
# Example: command = ["/home/pi/snipsSuperTTS/snipsSuperTTS.sh", "%%OUTPUT_FILE%%", "google", "%%LANG%%", "US", "Wavenet-C", "FEMALE", "%%TEXT%%", "22050"]
###### AMAZON #######
# Install Amazon sdk
# curl "https://s3.amazonaws.com/aws-cli/awscli-bundle.zip" -o "awscli-bundle.zip"
# unzip awscli-bundle.zip
# ./awscli-bundle/install -i /usr/local/aws -b /usr/local/bin/aws
# Uncomment the following lines and set them accordingly:
#
#export AWS_ACCESS_KEY_ID=""
#export AWS_SECRET_ACCESS_KEY=""
#export AWS_DEFAULT_REGION="eu-central-1"
#awscli='/usr/local/bin/aws'
#
# command = ["/home/pi/snipsSuperTTS/snipsSuperTTS.sh", "%%OUTPUT_FILE%%", "amazon", "%%LANG%%", "US", "Joanna", "--", "%%TEXT%%", "22050"]
#
####################################################################################################
outfile="$1"
service="$2"
lang="$3"
country="$4"
voice="$5"
gender="$6"
text="$7"
sampleRate="$8"
if [ "$service" = "mycroft" ] || [ "$service" = "picotts" ]; then
status="offline"
else
echo -e "GET http://google.com HTTP/1.0\n\n" | nc google.com 80 > /dev/null 2>&1
if [ $? -eq 0 ]; then
status="online"
else
status="offline"
fi
# Alternative for some people having problem pinging google.com. Comment the above and uncomment the following
#wget -q --tries=1 --timeout=1 --spider http://google.com
#if [[ $? -eq 0 ]]; then
# status="online"
#else
# status="offline"
#fi
fi
function picotts() {
case "$lang" in
*en*)
lang="en-US";;
*de*)
lang="de-DE";;
*es*)
lang="es-ES";;
*fr*)
lang="fr-FR";;
*it*)
lang="it-IT";;
*)
lang="en-US";;
esac
text=$(sed 's/<[^>]*>//g' <<< "$text")
pico2wave -w "$outfile" -l "$lang" "$text"
}
function mycroft() {
text=$(sed 's/<[^>]*>//g' <<< "$text")
."$mycroftPath/mimic" -t "$text" -o "$outfile" -voice "$mycroftPath""/voices/cmu_us_""$voice"".flitevox"
}
if [ "$service" = "google" ]; then
cache="$cache/google/"
mkdir -p "$cache"
text=${text//\'/\\\'}
languageCode="$lang"-"$country"
googleVoice="$languageCode"-"$voice"
md5string="$text"_"$googleVoice"_"$sampleRate"_"$lang"
hash="$(echo -n "$md5string" | md5sum | sed 's/ .*$//')"
cachefile="$cache""$hash".wav
downloadFile="/tmp/""$hash"
if [[ -f "$cachefile" ]]; then
cp "$cachefile" "$outfile"
else
if [ "$status" != "online" ]; then
if [[ "$useMycroft" = true ]]; then
mycroft
else
picotts
fi
else
if [ "$text" != *"<speak>"* ]; then
text="<speak>""$text""</speak>"
fi
curl -H "Content-Type: application/json; charset=utf-8" \
--data "{
'input':{
'ssml':'$text'
},
'voice':{
'languageCode':'$languageCode',
'name':'$googleVoice',
'ssmlGender':'$gender'
},
'audioConfig':{
'audioEncoding':'MP3',
'sampleRateHertz':'$sampleRate'
}
}" "https://texttospeech.googleapis.com/v1/text:synthesize?key="$googleWavenetAPIKey > "$downloadFile"
sed -i 's/audioContent//' "$downloadFile" && \
tr -d '\n ":{}' < "$downloadFile" > "$downloadFile".tmp && \
base64 "$downloadFile".tmp --decode > "$downloadFile".mp3
mpg123 --quiet --wav "$cachefile" "$downloadFile".mp3
rm "$downloadFile" && \
rm "$downloadFile".tmp && \
rm "$downloadFile".mp3
cp "$cachefile" "$outfile"
fi
fi
elif [ "$service" = "amazon" ]; then
cache="$cache/amazon/"
mkdir -p "$cache"
amazonVoice=$voice
md5string="$text""_""$amazonVoice"_"$sampleRate"_"$lang"
hash="$(echo -n "$md5string" | md5sum | sed 's/ .*$//')"
cachefile="$cache""$hash".mp3
if [ -f "$cachefile" ]; then
mpg123 -q -w $outfile $cachefile
else
if [ "$status" != "online" ]; then
if [[ "$useMycroft" = true ]]; then
mycroft
else
picotts
fi
else
if [ "$text" != *"<speak>"* ];then
text="<speak>""$text""</speak>"
fi
$awscli polly synthesize-speech \
--output-format mp3 \
--voice-id "$voice" \
--sample-rate "$sampleRate" \
--text-type ssml \
--text "$text" \
"$cachefile"
mpg123 -q -w $outfile $cachefile
fi
fi
elif [ "$service" = "picotts" ]; then
picotts
elif [ "$service" = "mycroft" ]; then
mycroft
else
if [[ "$useMycroft" = true ]]; then
mycroft
else
picotts
fi
fi
@tbearman
Copy link

Thanks for this great script. I've only tried it with Mycroft for now, but I did have to add the executable in line 125:

from:
."$mycroftPath" -t "$text" -o "$outfile" -voice "$mycroftPath""/voices/cmu_us_""$voice"".flitevox"

to:
."$mycroftPath/mimic" -t "$text" -o "$outfile" -voice "$mycroftPath""/voices/cmu_us_""$voice"".flitevox"

@a1higgins-oss
Copy link

@Psychokiller1888
You could also add the account free google-translate tts API call to cover all bases. I created the following bash script for my simple needs, but feel free to integrate it in to your super script. Enjoy.

#!/bin/bash

if [ $1 = "-h" ]
then
  echo "Usage: $0 outfile language text"
  exit 0
fi

cache_dir="/var/cache/google-translate-tts"
tmpfile=$(mktemp /tmp/google-translate-tts.XXXXXXXX)
filename_md5=$(echo -n $3 | md5sum)
filename=$cache_dir"/"${filename_md5%% *}".wav"

echo $tmpfile
echo $filename

if [ ! -d $cache_dir ]
then
  echo $cache_dir" does not exist. Please create with needed permissions."
fi

if [ -s $filename ]
then
  echo "Using cached audio file "$filename
else
  echo "Caching new audio file "$filename
  wget -q -U Mozilla -O $tmpfile "http://translate.google.com/translate_tts?ie=UTF-8&client=tw-ob&tl=$2&q=$3"
  mpg123 --quiet --wav $filename $tmpfile
  rm -f $tmpfile
fi

cp $filename $1

@hokascha
Copy link

The gcloud auth application-default print-access-token call is teribly slow on my Raspi2B+ and is also intended to be used for debugging only. Would you consider switching to OAuth or other possible auth mechanisms?

@hokascha
Copy link

hokascha commented Jun 5, 2019

To speed up Google TTS, just create an API key in Google cloud console and add "?key=XXX" to the Google URL while removing the Authorization-Header. Way faster than generating access tokens at each call.

@Psychokiller1888
Copy link
Author

@hokascha Thx! I updated to use the api key!

@Psychokiller1888
Copy link
Author

@a1higgins-oss

Thank you also! Will add it. I have a few others coming also, such as IBM

@adelapole
Copy link

How would you go about changing Google TTS mp3 to LINEAR16?

'audioEncoding':'mp3', to 'audioEncoding':'LINEAR16',

When I do this the cache file is created but playback is a short 'glitch/chirp' sound.

Google TTS API states that a .wav header is included when requesting LINEAR16.

}" "https://texttospeech.googleapis.com/v1/text:synthesize?key="$googleWavenetAPIKey > "$downloadFile"

            sed -i 's/audioContent//' "$downloadFile" && \
            tr -d '\n ":{}' < "$downloadFile" > "$downloadFile".tmp && \
            base64 "$downloadFile".tmp --decode > "$downloadFile".wav

@Psychokiller1888
Copy link
Author

Is the audio file ok somewhat? I mean, if you open it with audacity per exemple, and change the rates and all, is it a normal sound file?

@adelapole
Copy link

Is the audio file ok somewhat? I mean, if you open it with audacity per exemple, and change the rates and all, is it a normal sound file?

@Psychokiller1888 Thanks for the reply.
My mistake: I was passing the decoded output through mpg123.
mpg123 --quiet --wav "$cachefile" "$downloadFile".wav

This worked for me for LINEAR16 with aplay.

'audioConfig':{
                'audioEncoding':'LINEAR16',
                'sampleRateHertz':'$sampleRate',
                'effectsProfileId': ['large-home-entertainment-class-device']
              }
            }" "https://texttospeech.googleapis.com/v1/text:synthesize?key="$googleWavenetAPIKey > "$downloadFile"

            sed -i 's/audioContent//' "$downloadFile" && \
            tr -d '\n ":{}' < "$downloadFile" > "$downloadFile".tmp && \
            base64 "$downloadFile".tmp --decode > "$downloadFile"

            aplay "$downloadFile".wav

            cp "$downloadFile" "$cachefile"

            rm "$downloadFile" && \
            rm "$downloadFile".tmp && \
            rm "$downloadFile".wav
            cp "$cachefile" "$outfile"

@Sohn123
Copy link

Sohn123 commented Aug 11, 2019

Hey I try to setup snips super tts with Google. And it works if I execute the script from the command line with Google but not if I set it up via /etc/snips.toml. With Mycroft it works perfectly. And I don't have any clue what to do to get it work. Any suggestions?

@Psychokiller1888
Copy link
Author

Psychokiller1888 commented Aug 12, 2019

did you edit the script to put your credentials? Check /var/log/syslog for errors returned also

@Sohn123
Copy link

Sohn123 commented Aug 12, 2019

Yes I edited it and put my credentials in the scripts. And if I execute the script from the command line it works with Google so the API key is correct. But if I try to call the script from snips it doesn't work with Google but it works with Mycroft so the path and everything should be also correct.

@Psychokiller1888
Copy link
Author

So it's most prolly a premission issue. Check your syslog to see if it fails accessing the cache directory

@Sohn123
Copy link

Sohn123 commented Aug 12, 2019

I thought I already changed the owner of the cache directory but it seems like I forgot about that... Now it works perfectly! Thanks for the fast help

@LazzaAU
Copy link

LazzaAU commented Aug 15, 2019

hi psycho, the first time i installed this a few weeks ago everything worked fine. Now that i've started with a fresh copy of stretch and snips ive tried setting this up again but have the following error when trying to install mimic

pi@snips:~ $ cd mimic
pi@snips:~/mimic $ ./dependencies.sh --prefix="/usr/local"
/usr/bin/pkg-config
Have PCRE2? [no]
--2019-08-16 00:39:41-- ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre2-10.23.zip
=> ‘pcre2-10.23.zip’
Resolving ftp.csx.cam.ac.uk (ftp.csx.cam.ac.uk)... 131.111.8.115
Connecting to ftp.csx.cam.ac.uk (ftp.csx.cam.ac.uk)|131.111.8.115|:21... failed: Connection timed out.
Retrying.

--2019-08-16 00:41:58-- ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre2-10.23.zip
(try: 2) => ‘pcre2-10.23.zip’
Connecting to ftp.csx.cam.ac.uk (ftp.csx.cam.ac.uk)|131.111.8.115|:21... failed: Connection timed out.
Retrying.

been that way for three days now. i also cant ping ftp.csx.cam.ac.uk so assuming the site is down ??
is there a alternative for this part of the install process ?

cheers

@LazzaAU
Copy link

LazzaAU commented Aug 16, 2019

FYI.
i edited the wget line in dependencies.sh file to be
wget "https://ftp.pcre.org/pub/pcre/pcre2-10.23.zip"

and that seemed to be a good alternative so far.

PCRE2 installation succeeded

@DanBmh
Copy link

DanBmh commented Sep 3, 2019

@Psychokiller1888
I created a fork of your skript with some additional functionalities (mainly more wavenet params and translate-tts). Feel free to copy code back to your gist.

@HorizonKane
Copy link

Hi,

I have problems to get this running. I installed the AWS cli bundle successfully.

I added this to snips.toml after uncommenting customtts provider:

customtts = { command = ["/home/pi/snipsSuperTTS/snipsSuperTTS.sh", "%%OUTPUT_FILE%%", "amazon", "%%LANG%%", "DE", "Vicky", "--", "%%TEXT%%", "22050"] }

I created an additional IAM account in AWS, giving it the Polly access profile. The keys I got there I put here:

export AWS_ACCESS_KEY_ID="myID"
export AWS_SECRET_ACCESS_KEY="mykey"
export AWS_DEFAULT_REGION="eu-central-1"
awscli='/usr/local/bin/aws'

When I now give a command to Snips, everything works fine but I hear no voice and sam watch says:

[11:20:28] [AudioServer] was asked to play a wav of 0.0 kB with id 'a5365dfa-27d1-4a91-b383-506beb1c4482' on site default

Of course there is no voice when it is asked to play 0kb ... the question is how I find out why the file did not get created?

@HorizonKane
Copy link

HorizonKane commented Sep 27, 2019

I connected to the AWS console and can see that the key was used today...so it must have connected to AWS successfully.

I also ran aws configure and used polly through aws cli which created me a mp3 successfully. But with Snips still no success.

@HorizonKane
Copy link

HorizonKane commented Sep 27, 2019

Okay, I found the problem in the syslog. The girls name is Vicki, not Vicky. She doesn't like being called Vicky... :)

@Psychokiller1888
Copy link
Author

I think it more a permission problem. You can check the syslog when it tries to answer tail -f /var/log/syslog

@HorizonKane
Copy link

Yeah that way I found it I spelled the name of the voice girl wrong. Now it's working :)

Installed mimic first, then realised it can't speak german :/

@Psychokiller1888
Copy link
Author

Yeah, it's sad mimic only speaks english

@HorizonKane
Copy link

Anyways, thanks for your work! Polly is quite nice. However a local solution that speaks german would be great.

@Andergraw
Copy link

Andergraw commented Dec 3, 2019

Hi @Psychokiller1888,
thanks for the script, it's been very useful. I have it working with Polly.
But now, I would like to test Mycroft's Mimic as default offline service (instead of pico).
When I follow your instructions (the same as Mycroft's) I get an error. I asked them directly, but do you have any clue what is this related to?
Thanks!
MycroftAI/mimic1#186

@Pittermaennchen
Copy link

Hi @Psychokiller1888,
thank you very much for that awesome script!

Unfortunately, I can`t get it running.

Here is what I did so far:
I installed mpg123 on my Raspberry Pi.
I edited the snips.toml file in /etc/: I uncommented the "provider: customtts" line and the "customtts = { command ..."-line with the command for Google's Wavenet TTS.
I copied the snipsSuperTTS.sh file to /home/pi
In the google console (after I was forced to enter my credit card number -.- ) I activated "Cloud Text-to-Speech API" and then I created an API-Key
Then I edited the snipsSuperTTS file: uncommented the line "googleWavenetAPIKey="MY API-KEY CONSISTING OF A LOT OF LETTERS AND NUMBERS"" and entered the API key I created in the console.cloud.google.
As advised in the Google Paragraph in snipsSuperTTS.sh-file, I installed Google SDK on my RaspberryPi, following the Linux-instructions here in Point 6 (https://cloud.google.com/text-to-speech/docs/quickstart-protocol)
Then, I tried to run the snipsSuperTTS.sh-file trying the command "sh snipsSuperTTS.sh" resulting in the error: "snipsSuperTTS.sh: 104: snipsSuperTTS.sh: Syntax error: "(" unexpected"
I don´t know how to solve this. The cache path "/home/pi/snipsSuperTTS/cache" was not created

Any clues are very welcome. Thank you very much!

@Psychokiller1888
Copy link
Author

Hi! As the error states, there's a syntax error on line 104 of you shell script

@Pittermaennchen
Copy link

Thank you for the quick reply!

I thought so, but I did not change anything in that line.

Lines 104-107 are:
"function mycroft() {
text=$(sed 's/<[^>]*>//g' <<< "$text")
."$mycroftPath/mimic" -t "$text" -o "$outfile" -voice "$mycroftPath""/voices/cmu_us_""$voice"".flitevox"
}"

@hokascha
Copy link

hokascha commented Mar 1, 2020

@Pittermaennchen check the lines before 103. Might have lost a } ?

@Pittermaennchen
Copy link

@Pittermaennchen check the lines before 103. Might have lost a } ?

hm. I can't find the source of the error. :(

The lines before are:

outfile="$1"
service="$2"
lang="$3"
country="$4"
voice="$5"
gender="$6"
text="$7"
sampleRate="$8"

if [ "$service" = "mycroft" ] || [ "$service" = "picotts" ]; then
status="offline"
else
echo -e "GET http://google.com HTTP/1.0\n\n" | nc google.com 80 > /dev/null 2>&1
if [ $? -eq 0 ]; then
status="online"
else
status="offline"
fi
# Alternative for some people having problem pinging google.com. Comment the above and uncomment the following
#wget -q --tries=1 --timeout=1 --spider http://google.com
#if [[ $? -eq 0 ]]; then
# status="online"
#else
# status="offline"
#fi
fi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment