Skip to content

Instantly share code, notes, and snippets.

@ripperdoc
Created August 11, 2022 09:01
Show Gist options
  • Save ripperdoc/3c25d1d431e23ff0153d8b4f7681bc40 to your computer and use it in GitHub Desktop.
Save ripperdoc/3c25d1d431e23ff0153d8b4f7681bc40 to your computer and use it in GitHub Desktop.
Azure supported SSML
<voice name="en-US-AriaNeural">
<!-- Some voices support expressing sentences with an emotion, like below. -->
<!-- Relevant voices for moods in current application -->
<mstts:express-as style="angry">I'm so angry</mstts:express-as>
<mstts:express-as style="cheerful">I'm so cheerful</mstts:express-as>
<mstts:express-as style="unfriendly">I'm so unfriendly</mstts:express-as>
<mstts:express-as style="hopeful">I'm so hopeful</mstts:express-as>
<!-- Relevant voices for future moods? -->
<mstts:express-as style="chat">I'm so chatty</mstts:express-as>
<mstts:express-as style="customerservice">I'm so customer service oriented</mstts:express-as>
<mstts:express-as style="empathetic">I'm so empathetic</mstts:express-as>
<mstts:express-as style="excited">I'm so excited</mstts:express-as>
<mstts:express-as style="friendly">I'm so friendly</mstts:express-as>
<mstts:express-as style="narration-professional">I'm so professional</mstts:express-as>
<mstts:express-as style="newscast-casual">I'm so newscast casual</mstts:express-as>
<mstts:express-as style="newscast-formal">I'm so newscast formal</mstts:express-as>
<mstts:express-as style="sad">I'm so sad</mstts:express-as>
<mstts:express-as style="shouting">I'm so shouting</mstts:express-as>
<mstts:express-as style="terrified">I'm so terrified</mstts:express-as>
<mstts:express-as style="whispering">I'm so whispering</mstts:express-as>
<!-- can also structure the text for better rythm. <s> element has not much effect.-->
<p>
<s>Introducing the sentence element.</s>
<s>Used to mark individual sentences.</s>
</p>
<p>
Another simple paragraph.
Sentence structure in this paragraph is not explicitly marked.
</p>
<!-- Or add breaks-->
<p>Let's try some breaks. First a <break strength="x-weak" />extra weak, then a <break strength="weak" />weak,
then a <break strength="medium" />medium and then we do a <break strength="strong" />strong one and finally an <break strength="x-strong" />
extra strong.</p>
<!-- sub is used to say something differently than written. For example, read out an acroynym. It can also be used to change pronounciation
by setting the alias to a string that would be pronounced differentl, but it's tricky and not always possible to get right.-->
<sub alias="World Wide Web Consortium">W3C</sub>
<!-- We can control exact pronounciation on specific words but we need to use pronounciation alphabets.
IPA is most correct but tricky to write. -->
<phoneme alphabet="ipa" ph="təˈmeɪtoʊ"> tomato </phoneme>
<!-- Future: it's possible to change language on word level, so loan words can be pronounced correctly. Currently only supported by
en-US-JennyMultilingualNeural and not for Swedish. -->
<!-- emphasis tags can be added to make certain words stand out, but only works on three US voices-->
<!-- audio: we can play audio, but for us only purpose would be to make the voice more realistic, such as coughing.
Would be hard to match the audio with the synthesized voice and the animation. Not something we need-->
</voice>
<voice name="sv-SE-HilleviNeural">
<!-- Prosody contains different settings for how the voice is generated and rythm.
pitch: can be controlled up and down by max +- 10% which makes the voice sound different, so we can get more voices this way.
contour: let's us control the pitch individually on a part of speech, e.g. to go up toward the end for a question.
rate: how fast/slow. This can be changed to fit mood or character?
volume: not often needed to change, but we can make filler words quieter or the angry voice a bit louder.
-->
<prosody pitch="-10.00%">
<phoneme alphabet="ipa" ph="²ị:te:">IT </phoneme><sub alias="chäfen">chefen</sub> älskade Best CRM-systemet med Xbox.
</prosody>
Stockholmare säger chefen och dörr, andra säger <sub alias="chäfen">chefen</sub>
<p>Låt oss pröva några pauser. Först en<break strength="x-weak" />extra kort, sedan en <break strength="weak" />kort,
sedan en <break strength="medium" />mellan och sen gör vi en <break strength="strong" />lång en och slutligen en <break strength="x-strong" />
extra lång.</p>
</voice>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment