Skip to content

Instantly share code, notes, and snippets.

@NTT123
Last active April 26, 2022 03:37
Show Gist options
  • Save NTT123/ec538440bb5ab5a4209d10d1cd25e105 to your computer and use it in GitHub Desktop.
Save NTT123/ec538440bb5ab5a4209d10d1cd25e105 to your computer and use it in GitHub Desktop.
Text to phonemes.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "Text to phonemes.ipynb",
"provenance": [],
"collapsed_sections": [],
"authorship_tag": "ABX9TyPxt7g1EwF5xXLqbF28l2ha",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/NTT123/ec538440bb5ab5a4209d10d1cd25e105/text-to-phonemes.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"- We use `phonemizer==3.1.1` library with `espeak-ng v1.51` backend.\n",
" + Fixed the versions to make sure that the output is deterministic.\n",
"- Reuse the espeak backend in the python example to speedup the converting."
],
"metadata": {
"id": "YDowu3Hy57Ms"
}
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"id": "RoC5hBxiUad0"
},
"outputs": [],
"source": [
"# install espeak-ng 1.51 and phonemizer 3.1.1\n",
"%%capture\n",
"!rm -rf espeak\n",
"!mkdir -p espeak\n",
"!cd espeak; wget https://github.com/espeak-ng/espeak-ng/archive/refs/tags/1.51.zip\n",
"!cd espeak; unzip -qq 1.51.zip\n",
"!sudo apt install -y make autoconf automake libtool pkg-config gcc\n",
"!cd espeak/espeak-ng-1.51; ./autogen.sh; ./configure --prefix=`pwd`/../usr\n",
"!cd espeak/espeak-ng-1.51; make; make install\n",
"!pip install -U pip\n",
"!pip install -U phonemizer==3.1.1"
]
},
{
"cell_type": "code",
"source": [
"# let phonemizer know where to load espeak\n",
"import os\n",
"os.environ[\"PHONEMIZER_ESPEAK_LIBRARY\"] = \"./espeak/usr/lib/libespeak-ng.so.1.1.51\""
],
"metadata": {
"id": "nhaU-G054sQk"
},
"execution_count": 2,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# command-line example\n",
"!echo \"Modern text-to-speech synthesis pipelines typically involve multiple processing stages.\" | phonemize -l en-us -b espeak --with-stress --strip --preserve-punctuation"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "KKZKoGK-U6w2",
"outputId": "cc1e5410-5ae2-4e83-d3a5-5a317919ebeb"
},
"execution_count": 3,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"mˈɑːdɚn tˈɛksttəspˈiːtʃ sˈɪnθəsˌɪs pˈaɪplaɪnz tˈɪpɪkli ɪnvˈɑːlv mˌʌltɪpəl pɹˈɑːsɛsɪŋ stˈeɪdʒᵻz.\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"# python example\n",
"%%time\n",
"from phonemizer.backend import EspeakBackend\n",
"backend = EspeakBackend('en-us', preserve_punctuation=True, with_stress=True)\n",
"s = \"Modern text-to-speech synthesis pipelines typically involve multiple processing stages.\" \n",
"a = []\n",
"for i in range(10000):\n",
" p = backend.phonemize([s], strip=True)\n",
" a.append(p[0])\n",
"print(a[0])"
],
"metadata": {
"id": "5cN6l0VLVPJF",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "9b9cd956-eb20-4890-fabe-e0d11d637ea7"
},
"execution_count": 4,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"mˈɑːdɚn tˈɛksttəspˈiːtʃ sˈɪnθəsˌɪs pˈaɪplaɪnz tˈɪpɪkli ɪnvˈɑːlv mˌʌltɪpəl pɹˈɑːsɛsɪŋ stˈeɪdʒᵻz.\n",
"CPU times: user 6.82 s, sys: 55.9 ms, total: 6.87 s\n",
"Wall time: 7.1 s\n"
]
}
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment