Skip to content

Instantly share code, notes, and snippets.

@darrenjrobinson
Last active August 23, 2019 04:22
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save darrenjrobinson/8cc50709b80163b385f61a33cf8ebf90 to your computer and use it in GitHub Desktop.
Save darrenjrobinson/8cc50709b80163b385f61a33cf8ebf90 to your computer and use it in GitHub Desktop.
Convert Text to Speech using Azure Cognitive Services
# Text to Speech w/ Azure Cognitive Services - Trial keys will be for the 'westus' location
$location = "westus"
$txt2SpeechTokenURI = "https://$($location).api.cognitive.microsoft.com/sts/v1.0/issueToken"
$key1 = "your api key"
# Output Settings
Add-Type -AssemblyName presentationCore
$mediaPlayer = New-Object system.windows.media.mediaplayer
# Output Path
$audioPath = "C:\temp\"
# Output File
$audiofile = "audiooutexample.mp3"
# Generate Request Auth Headers
$TokenHeaders = @{"Ocp-Apim-Subscription-Key" = $key1;
"Content-Length"= "0";
"Content-type" = "application/x-www-form-urlencoded"
}
# Get OAuth Token
$OAuthToken = Invoke-RestMethod -Method POST -Uri $txt2SpeechTokenURI -Headers $TokenHeaders
# Text to Speech Endpoint
$URI = "https://$($location).tts.speech.microsoft.com/cognitiveservices/v1"
# Output formats
#ssml-16khz-16bit-mono-tts
#raw-16khz-16bit-mono-pcm
#audio-16khz-16kbps-mono-siren
#riff-16khz-16kbps-mono-siren
#riff-16khz-16bit-mono-pcm
#audio-16khz-128kbitrate-mono-mp3
#audio-16khz-64kbitrate-mono-mp3
#audio-16khz-32kbitrate-mono-mp3
$headers = @{"Ocp-Apim-Subscription-Key" = $key1;
"Content-Type" = "application/ssml+xml";
"X-Microsoft-OutputFormat" = "audio-16khz-32kbitrate-mono-mp3";
"User-Agent" = "MIMText2Speech";
"Authorization" = $OAuthToken
}
# Voices https://docs.microsoft.com/en-us/azure/cognitive-services/speech/api-reference-rest/bingvoiceoutput#SupLocales
#Microsoft Server Speech Text to Speech Voice (en-US, JessaRUS)
#Microsoft Server Speech Text to Speech Voice (en-GB, Susan, Apollo)
#Microsoft Server Speech Text to Speech Voice (en-AU, HayleyRUS)
[xml]$Voice = @'
<speak version='1.0' xmlns="http://www.w3.org/2001/10/synthesis" xml:lang='en-US'>
<voice name='Microsoft Server Speech Text to Speech Voice (en-AU, HayleyRUS)'>
TEXTTOCONVERT
</voice>
</speak>
'@
# Inject text to convert
$Voice.speak.voice.'#text' = "I just converted this string to speech using Azure"
$Voice.speak.voice.'#text'
# Send request for conversion
Invoke-RestMethod -Method POST -Uri $URI -Headers $headers -Body $voice -ContentType "application/ssml+xml" -OutFile "$($audioPath)$($audiofile)"
# small delay for file to be written, open the file and play
start-sleep -Seconds 1
$mediaPlayer.open($audioPath + $audiofile)
Start-Sleep -Seconds 1
$responseDuration = $mediaPlayer.NaturalDuration.TimeSpan.TotalSeconds
$mediaPlayer.Play()
Start-Sleep -Seconds $responseDuration
$mediaPlayer.Close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment