Skip to content

Instantly share code, notes, and snippets.

@r9y9
Created November 7, 2018 03:48
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save r9y9/a40430f9ebffe6fb6ff64d2d7dff7f58 to your computer and use it in GitHub Desktop.
Save r9y9/a40430f9ebffe6fb6ff64d2d7dff7f58 to your computer and use it in GitHub Desktop.
Text processing (ja) for DNN TTS.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Populating the interactive namespace from numpy and matplotlib\n"
]
}
],
"source": [
"%pylab inline\n",
"import pyopenjtalk\n",
"from nnmnkwii.io import hts\n",
"from nnmnkwii.frontend import merlin as fe\n",
"from os.path import join, expanduser"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### OpenJtalkが内包するmecabの出力\n",
"\n",
"OpenJTalkの言語処理フロントエンドを使いやすいように、昔ライブラリを作りました。\n",
"https://github.com/r9y9/pyopenjtalk"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['こんにちは,感動詞,*,*,*,*,*,こんにちは,コンニチハ,コンニチワ,0/5,-1,-1']"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pyopenjtalk.run_frontend(\"こんにちは\")[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### OpenJTalk の言語処理フロントエンドの出力(フルコンテキストラベル w/o time info)\n",
"\n",
"注意:OpenJtalkのコマンドラインツールでは音素継続長も出力されるが、言語処理部分ではなくHTS_engine が出力している。つまり\n",
"言語処理フロントエンドとは関係がない"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['xx^xx-sil+k=o/A:xx+xx+xx/B:xx-xx_xx/C:xx_xx+xx/D:xx+xx_xx/E:xx_xx!xx_xx-xx/F:xx_xx#xx_xx@xx_xx|xx_xx/G:5_5%0_xx_xx/H:xx_xx/I:xx-xx@xx+xx&xx-xx|xx+xx/J:1_5/K:1+1-5',\n",
" 'xx^sil-k+o=N/A:-4+1+5/B:xx-xx_xx/C:09_xx+xx/D:xx+xx_xx/E:xx_xx!xx_xx-xx/F:5_5#0_xx@1_1|1_5/G:xx_xx%xx_xx_xx/H:xx_xx/I:1-5@1+1&1-1|1+5/J:xx_xx/K:1+1-5',\n",
" 'sil^k-o+N=n/A:-4+1+5/B:xx-xx_xx/C:09_xx+xx/D:xx+xx_xx/E:xx_xx!xx_xx-xx/F:5_5#0_xx@1_1|1_5/G:xx_xx%xx_xx_xx/H:xx_xx/I:1-5@1+1&1-1|1+5/J:xx_xx/K:1+1-5',\n",
" 'k^o-N+n=i/A:-3+2+4/B:xx-xx_xx/C:09_xx+xx/D:xx+xx_xx/E:xx_xx!xx_xx-xx/F:5_5#0_xx@1_1|1_5/G:xx_xx%xx_xx_xx/H:xx_xx/I:1-5@1+1&1-1|1+5/J:xx_xx/K:1+1-5',\n",
" 'o^N-n+i=ch/A:-2+3+3/B:xx-xx_xx/C:09_xx+xx/D:xx+xx_xx/E:xx_xx!xx_xx-xx/F:5_5#0_xx@1_1|1_5/G:xx_xx%xx_xx_xx/H:xx_xx/I:1-5@1+1&1-1|1+5/J:xx_xx/K:1+1-5',\n",
" 'N^n-i+ch=i/A:-2+3+3/B:xx-xx_xx/C:09_xx+xx/D:xx+xx_xx/E:xx_xx!xx_xx-xx/F:5_5#0_xx@1_1|1_5/G:xx_xx%xx_xx_xx/H:xx_xx/I:1-5@1+1&1-1|1+5/J:xx_xx/K:1+1-5',\n",
" 'n^i-ch+i=w/A:-1+4+2/B:xx-xx_xx/C:09_xx+xx/D:xx+xx_xx/E:xx_xx!xx_xx-xx/F:5_5#0_xx@1_1|1_5/G:xx_xx%xx_xx_xx/H:xx_xx/I:1-5@1+1&1-1|1+5/J:xx_xx/K:1+1-5',\n",
" 'i^ch-i+w=a/A:-1+4+2/B:xx-xx_xx/C:09_xx+xx/D:xx+xx_xx/E:xx_xx!xx_xx-xx/F:5_5#0_xx@1_1|1_5/G:xx_xx%xx_xx_xx/H:xx_xx/I:1-5@1+1&1-1|1+5/J:xx_xx/K:1+1-5',\n",
" 'ch^i-w+a=sil/A:0+5+1/B:xx-xx_xx/C:09_xx+xx/D:xx+xx_xx/E:xx_xx!xx_xx-xx/F:5_5#0_xx@1_1|1_5/G:xx_xx%xx_xx_xx/H:xx_xx/I:1-5@1+1&1-1|1+5/J:xx_xx/K:1+1-5',\n",
" 'i^w-a+sil=xx/A:0+5+1/B:xx-xx_xx/C:09_xx+xx/D:xx+xx_xx/E:xx_xx!xx_xx-xx/F:5_5#0_xx@1_1|1_5/G:xx_xx%xx_xx_xx/H:xx_xx/I:1-5@1+1&1-1|1+5/J:xx_xx/K:1+1-5',\n",
" 'w^a-sil+xx=xx/A:xx+xx+xx/B:xx-xx_xx/C:xx_xx+xx/D:xx+xx_xx/E:5_5!0_xx-xx/F:xx_xx#xx_xx@xx_xx|xx_xx/G:xx_xx%xx_xx_xx/H:1_5/I:xx-xx@xx+xx&xx-xx|xx+xx/J:xx_xx/K:1+1-5']"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pyopenjtalk.run_frontend(\"こんにちは\")[1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 音素単位の言語特徴量の抽出\n",
"\n",
"継続帳モデルを学習するために必要です。"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"# 言語特徴量ベクトルを抽出するための questionファイルを読む\n",
"question = join(expanduser(\"~\"), \"sp/nnmnkwii_gallery/data/questions_jp.hed\")\n",
"binary_dict, continuous_dict = hts.load_question_set(question)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"# nnmnkwii の HTS に便宜上変換(nnmnkwiiのmerlin frontendを使うため)\n",
"labels = hts.load(lines=pyopenjtalk.run_frontend(\"音声合成は難しいですね〜。大変です\")[1])"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"# *: 音響モデルを学習するための言語特徴量を抽出する場合は、add_frame_features=True にして、フレーム単位の特徴量を抽出する必要がある\n",
"# https://r9y9.github.io/nnmnkwii/latest/references/generated/nnmnkwii.frontend.merlin.linguistic_features.html\n",
"# ただし、その場合は HTS labelに時間アライメントの情報が必要(Juliusを使って付与するのが簡単です)\n",
"features = fe.linguistic_features(labels, binary_dict, continuous_dict)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(41, 531)"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"assert features.shape[0] == len(labels)\n",
"features.shape"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 1080x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"figure(figsize=(15, 4))\n",
"imshow(features.T, aspect=\"auto\")\n",
"xlabel(\"phone-index\")\n",
"ylabel(\"feature index\");"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## フレーム単位の言語特徴量の抽出"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 3125000 xx^xx-sil+m=i/A:xx+xx+xx/B:xx-xx_xx/C:xx_xx+xx/D:02+xx_xx/E:xx_xx!xx_xx-xx/F:xx_xx#xx_xx@xx_xx|xx_xx/G:3_3%0_xx_xx/H:xx_xx/I:xx-xx@xx+xx&xx-xx|xx+xx/J:5_23/K:1+5-23\n",
"3125000 3525000 xx^sil-m+i=z/A:-2+1+3/B:xx-xx_xx/C:02_xx+xx/D:13+xx_xx/E:xx_xx!xx_xx-xx/F:3_3#0_xx@1_5|1_23/G:7_2%0_xx_1/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"3525000 4325000 sil^m-i+z=u/A:-2+1+3/B:xx-xx_xx/C:02_xx+xx/D:13+xx_xx/E:xx_xx!xx_xx-xx/F:3_3#0_xx@1_5|1_23/G:7_2%0_xx_1/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"4325000 5225000 m^i-z+u=o/A:-1+2+2/B:xx-xx_xx/C:02_xx+xx/D:13+xx_xx/E:xx_xx!xx_xx-xx/F:3_3#0_xx@1_5|1_23/G:7_2%0_xx_1/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"5225000 5525000 i^z-u+o=m/A:-1+2+2/B:xx-xx_xx/C:02_xx+xx/D:13+xx_xx/E:xx_xx!xx_xx-xx/F:3_3#0_xx@1_5|1_23/G:7_2%0_xx_1/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"5525000 6525000 z^u-o+m=a/A:0+3+1/B:02-xx_xx/C:13_xx+xx/D:18+xx_xx/E:xx_xx!xx_xx-xx/F:3_3#0_xx@1_5|1_23/G:7_2%0_xx_1/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"6525000 7524999 u^o-m+a=r/A:-1+1+7/B:13-xx_xx/C:18_xx+xx/D:13+xx_xx/E:3_3!0_xx-1/F:7_2#0_xx@2_4|4_20/G:6_6%0_xx_1/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"7524999 8225000 o^m-a+r=e/A:-1+1+7/B:13-xx_xx/C:18_xx+xx/D:13+xx_xx/E:3_3!0_xx-1/F:7_2#0_xx@2_4|4_20/G:6_6%0_xx_1/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"8225000 8725000 m^a-r+e=e/A:0+2+6/B:13-xx_xx/C:18_xx+xx/D:13+xx_xx/E:3_3!0_xx-1/F:7_2#0_xx@2_4|4_20/G:6_6%0_xx_1/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"8725000 9125000 a^r-e+e=sh/A:0+2+6/B:13-xx_xx/C:18_xx+xx/D:13+xx_xx/E:3_3!0_xx-1/F:7_2#0_xx@2_4|4_20/G:6_6%0_xx_1/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"9125000 9725000 r^e-e+sh=i/A:1+3+5/B:13-xx_xx/C:18_xx+xx/D:13+xx_xx/E:3_3!0_xx-1/F:7_2#0_xx@2_4|4_20/G:6_6%0_xx_1/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"9725000 10925000 e^e-sh+i=a/A:2+4+4/B:13-xx_xx/C:18_xx+xx/D:13+xx_xx/E:3_3!0_xx-1/F:7_2#0_xx@2_4|4_20/G:6_6%0_xx_1/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"10925000 11225000 e^sh-i+a=k/A:2+4+4/B:13-xx_xx/C:18_xx+xx/D:13+xx_xx/E:3_3!0_xx-1/F:7_2#0_xx@2_4|4_20/G:6_6%0_xx_1/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"11225000 12325000 sh^i-a+k=a/A:3+5+3/B:13-xx_xx/C:18_xx+xx/D:13+xx_xx/E:3_3!0_xx-1/F:7_2#0_xx@2_4|4_20/G:6_6%0_xx_1/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"12325000 12925000 i^a-k+a=r/A:4+6+2/B:18-xx_xx/C:13_xx+xx/D:20+1_0/E:3_3!0_xx-1/F:7_2#0_xx@2_4|4_20/G:6_6%0_xx_1/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"12925000 13325000 a^k-a+r=a/A:4+6+2/B:18-xx_xx/C:13_xx+xx/D:20+1_0/E:3_3!0_xx-1/F:7_2#0_xx@2_4|4_20/G:6_6%0_xx_1/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"13325000 13825000 k^a-r+a=k/A:5+7+1/B:18-xx_xx/C:13_xx+xx/D:20+1_0/E:3_3!0_xx-1/F:7_2#0_xx@2_4|4_20/G:6_6%0_xx_1/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"13825000 14325000 a^r-a+k=a/A:5+7+1/B:18-xx_xx/C:13_xx+xx/D:20+1_0/E:3_3!0_xx-1/F:7_2#0_xx@2_4|4_20/G:6_6%0_xx_1/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"14325000 15325000 r^a-k+a=w/A:-5+1+6/B:13-xx_xx/C:20_1+0/D:10+7_1/E:7_2!0_xx-1/F:6_6#0_xx@3_3|11_13/G:4_2%0_xx_1/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"15325000 15925000 a^k-a+w=a/A:-5+1+6/B:13-xx_xx/C:20_1+0/D:10+7_1/E:7_2!0_xx-1/F:6_6#0_xx@3_3|11_13/G:4_2%0_xx_1/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"15925000 16825000 k^a-w+a=n/A:-4+2+5/B:13-xx_xx/C:20_1+0/D:10+7_1/E:7_2!0_xx-1/F:6_6#0_xx@3_3|11_13/G:4_2%0_xx_1/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"16825000 17225000 a^w-a+n=a/A:-4+2+5/B:13-xx_xx/C:20_1+0/D:10+7_1/E:7_2!0_xx-1/F:6_6#0_xx@3_3|11_13/G:4_2%0_xx_1/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"17225000 17725000 w^a-n+a=k/A:-3+3+4/B:20-1_0/C:10_7+1/D:12+xx_xx/E:7_2!0_xx-1/F:6_6#0_xx@3_3|11_13/G:4_2%0_xx_1/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"17725000 18425000 a^n-a+k=U/A:-3+3+4/B:20-1_0/C:10_7+1/D:12+xx_xx/E:7_2!0_xx-1/F:6_6#0_xx@3_3|11_13/G:4_2%0_xx_1/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"18425000 18925000 n^a-k+U=t/A:-2+4+3/B:20-1_0/C:10_7+1/D:12+xx_xx/E:7_2!0_xx-1/F:6_6#0_xx@3_3|11_13/G:4_2%0_xx_1/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"18925000 19225000 a^k-U+t=e/A:-2+4+3/B:20-1_0/C:10_7+1/D:12+xx_xx/E:7_2!0_xx-1/F:6_6#0_xx@3_3|11_13/G:4_2%0_xx_1/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"19225000 19725000 k^U-t+e=w/A:-1+5+2/B:10-7_1/C:12_xx+xx/D:24+xx_xx/E:7_2!0_xx-1/F:6_6#0_xx@3_3|11_13/G:4_2%0_xx_1/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"19725000 20025000 U^t-e+w=a/A:-1+5+2/B:10-7_1/C:12_xx+xx/D:24+xx_xx/E:7_2!0_xx-1/F:6_6#0_xx@3_3|11_13/G:4_2%0_xx_1/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"20025000 20724999 t^e-w+a=n/A:0+6+1/B:12-xx_xx/C:24_xx+xx/D:17+1_0/E:7_2!0_xx-1/F:6_6#0_xx@3_3|11_13/G:4_2%0_xx_1/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"20724999 21125000 e^w-a+n=a/A:0+6+1/B:12-xx_xx/C:24_xx+xx/D:17+1_0/E:7_2!0_xx-1/F:6_6#0_xx@3_3|11_13/G:4_2%0_xx_1/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"21125000 21825000 w^a-n+a=r/A:-1+1+4/B:24-xx_xx/C:17_1+0/D:10+7_2/E:6_6!0_xx-1/F:4_2#0_xx@4_2|17_7/G:3_2%0_xx_1/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"21825000 22525000 a^n-a+r=a/A:-1+1+4/B:24-xx_xx/C:17_1+0/D:10+7_2/E:6_6!0_xx-1/F:4_2#0_xx@4_2|17_7/G:3_2%0_xx_1/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"22525000 23025000 n^a-r+a=n/A:0+2+3/B:24-xx_xx/C:17_1+0/D:10+7_2/E:6_6!0_xx-1/F:4_2#0_xx@4_2|17_7/G:3_2%0_xx_1/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"23025000 23424999 a^r-a+n=a/A:0+2+3/B:24-xx_xx/C:17_1+0/D:10+7_2/E:6_6!0_xx-1/F:4_2#0_xx@4_2|17_7/G:3_2%0_xx_1/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"23424999 24225000 r^a-n+a=i/A:1+3+2/B:17-1_0/C:10_7+2/D:22+xx_xx/E:6_6!0_xx-1/F:4_2#0_xx@4_2|17_7/G:3_2%0_xx_1/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"24225000 24725000 a^n-a+i=n/A:1+3+2/B:17-1_0/C:10_7+2/D:22+xx_xx/E:6_6!0_xx-1/F:4_2#0_xx@4_2|17_7/G:3_2%0_xx_1/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"24725000 25025000 n^a-i+n=o/A:2+4+1/B:17-1_0/C:10_7+2/D:22+xx_xx/E:6_6!0_xx-1/F:4_2#0_xx@4_2|17_7/G:3_2%0_xx_1/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"25025000 25724999 a^i-n+o=d/A:-1+1+3/B:10-7_2/C:22_xx+xx/D:10+7_2/E:4_2!0_xx-1/F:3_2#0_xx@5_1|21_3/G:xx_xx%xx_xx_xx/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"25724999 26125000 i^n-o+d=e/A:-1+1+3/B:10-7_2/C:22_xx+xx/D:10+7_2/E:4_2!0_xx-1/F:3_2#0_xx@5_1|21_3/G:xx_xx%xx_xx_xx/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"26125000 26525000 n^o-d+e=s/A:0+2+2/B:22-xx_xx/C:10_7+2/D:xx+xx_xx/E:4_2!0_xx-1/F:3_2#0_xx@5_1|21_3/G:xx_xx%xx_xx_xx/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"26525000 27325000 o^d-e+s=U/A:0+2+2/B:22-xx_xx/C:10_7+2/D:xx+xx_xx/E:4_2!0_xx-1/F:3_2#0_xx@5_1|21_3/G:xx_xx%xx_xx_xx/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"27325000 29625000 d^e-s+U=sil/A:1+3+1/B:22-xx_xx/C:10_7+2/D:xx+xx_xx/E:4_2!0_xx-1/F:3_2#0_xx@5_1|21_3/G:xx_xx%xx_xx_xx/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"29625000 30025000 e^s-U+sil=xx/A:1+3+1/B:22-xx_xx/C:10_7+2/D:xx+xx_xx/E:4_2!0_xx-1/F:3_2#0_xx@5_1|21_3/G:xx_xx%xx_xx_xx/H:xx_xx/I:5-23@1+1&1-5|1+23/J:xx_xx/K:1+5-23\n",
"30025000 31825000 s^U-sil+xx=xx/A:xx+xx+xx/B:10-7_2/C:xx_xx+xx/D:xx+xx_xx/E:3_2!0_xx-xx/F:xx_xx#xx_xx@xx_xx|xx_xx/G:xx_xx%xx_xx_xx/H:5_23/I:xx-xx@xx+xx&xx-xx|xx+xx/J:xx_xx/K:1+5-23"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Juliusで付与したアライメント情報ありのフルコンテキストラベル\n",
"# フレームシフトは5msを仮定(デフォルト)\n",
"labels = hts.load(join(expanduser(\"~\"), \"data/jsut_ver1.1/basic5000/lab/BASIC5000_0001.lab\"));\n",
"labels"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"features = fe.linguistic_features(labels, binary_dict, continuous_dict,\n",
" subphone_features=\"coarse_coding\",\n",
" add_frame_features=True)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(632, 535)"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"features.shape"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 1080x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"figure(figsize=(15, 4))\n",
"imshow(features.T, aspect=\"auto\")\n",
"xlabel(\"frame index\")\n",
"ylabel(\"feature index\");"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 参考\n",
"\n",
"gantts/segmentation-kitに、JSUTで日本語音声合成を作るためのコードがあります。日本語音声合成に必要なフルコンテキストラベルを用意するコードがすべて含まれています(Juliusで音素アライメントを取る部分も)。\n",
"ただし、特にsegmentation-kitは自分用のコードで、整理された読みやすいコードではないので、ご了承ください\n",
"\n",
"- https://github.com/r9y9/segmentation-kit/tree/jsut2\n",
"- https://github.com/r9y9/gantts/tree/ja\n",
"- https://github.com/r9y9/nnmnkwii_gallery/tree/master/data 日本語用のquestion fileがあります\n",
"- https://github.com/r9y9/nnmnkwii フルコンテキストラベル -> 数値ベクトル化のツール(Merlinのコードを拝借しました)があります"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment