Skip to content

Instantly share code, notes, and snippets.

@candlewill
Last active July 12, 2021 05:05
Show Gist options
  • Save candlewill/8141bbe9d6c4c6224be8d3b4c07723eb to your computer and use it in GitHub Desktop.
Save candlewill/8141bbe9d6c4c6224be8d3b4c07723eb to your computer and use it in GitHub Desktop.
Chinese TTS based on Ossian

Ossian初探

本文先完全按照官方教程跑通一个合成流程,然后尝试在中文上进行合成。

安装

虽然官方提供了一键安装方法:./scripts/setup_tools.sh $HTK_USERNAME $HTK_PASSWORD,但在我们的尝试中,未能成功。

以下是Debug过程

直接运行出现的错误为:

make[2]: Entering directory '/root/workspace/Projects/Ossian/tools/downloads/SPTK-3.6/bin/delta'
clang -DPACKAGE_NAME=\"SPTK\" -DPACKAGE_TARNAME=\"sptk\" -DPACKAGE_VERSION=\"3.6\" -DPACKAGE_STRING=\"SPTK\ 3.6\" -DPACKAGE_BUGREPORT=\"http://sourceforge.net/projects/sp-tk/\" -DHAVE_LIBM=1 -DX_DISPLAY_MISSING=1 -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_FCNTL_H=1 -DHAVE_LIMITS_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_STRINGS_H=1 -DHAVE_SYS_IOCTL_H=1 -DHAVE_STDLIB_H=1 -DHAVE_MALLOC=1 -DHAVE_BZERO=1 -DHAVE_MEMSET=1 -DHAVE_STRRCHR=1 -DHAVE_RINDEX=1 -DFORMAT=\"float\" -DLINUX=1 -I. -I../../include    -g -O2 -MT delta.o -MD -MP -MF .deps/delta.Tpo -c -o delta.o delta.c
/bin/bash: clang: command not found
Makefile:239: recipe for target 'delta.o' failed
make[2]: *** [delta.o] Error 127
make[2]: Leaving directory '/root/workspace/Projects/Ossian/tools/downloads/SPTK-3.6/bin/delta'
Makefile:317: recipe for target 'install-recursive' failed
make[1]: *** [install-recursive] Error 1
make[1]: Leaving directory '/root/workspace/Projects/Ossian/tools/downloads/SPTK-3.6/bin'
Makefile:268: recipe for target 'install-recursive' failed
make: *** [install-recursive] Error 1

猜测可能是clang没有安装,安装apt-get install clang。问题解决。

如果没有root权限安装clang,可以执行下面操作:

先安装Cmake

wget https://cmake.org/files/v3.9/cmake-3.9.1.tar.gz
tar -xf cmake*.tar.gz
cd cmake*
./configure --prefix=$HOME
make
make install

再安装Clang,参考:https://clang.llvm.org/get_started.html

# Check out Clang:
cd llvm/tools
# svn co http://llvm.org/svn/llvm-project/cfe/trunk clang
cd ../..
# Check out extra Clang tools: (optional)
cd llvm/tools/clang/tools
svn co http://llvm.org/svn/llvm-project/clang-tools-extra/trunk extra
cd ../../../..
# Check out Compiler-RT (optional):
cd llvm/projects
svn co http://llvm.org/svn/llvm-project/compiler-rt/trunk compiler-rt
cd ../..
# Check out libcxx: (only required to build and run Compiler-RT tests on OS X, optional otherwise)
cd llvm/projects
svn co http://llvm.org/svn/llvm-project/libcxx/trunk libcxx
cd ../..
# Build LLVM and Clang:
mkdir build (in-tree build is not supported)
cd build
cmake -G "Unix Makefiles" ../llvm
make
# This builds both LLVM and Clang for debug mode.
# Note: For subsequent Clang development, you can just run make clang.
# CMake allows you to generate project files for several IDEs: Xcode, Eclipse CDT4, CodeBlocks, Qt-Creator (use the CodeBlocks
# generator), KDevelop3. For more details see Building LLVM with CMake page.

最后设置使用clang编译SPTK:

将setup_tools.sh中的第104行:

sed 's/CC = gcc/CC = clang/' ./bin/delta/Makefile.BAK > ./bin/delta/Makefile     ## (see http://sourceforge.net/p/sp-tk/bugs/68/)

更改为:

sed 's#CC = gcc#CC = /home/dl80/heyunchao/Install_Programs/build/bin/clang#' ./bin/delta/Makefile.BAK > ./bin/delta/Makefile     ## (see http://sourceforge.net/p/sp-tk/bugs/68/)

如果遇到下面问题:

Intel MKL FATAL ERROR: Cannot load libmkl_avx2.so or libmkl_def.so.

解决方法:

conda install nomkl numpy scipy scikit-learn numexpr
conda remove mkl mkl-service

Hello World: Romanian语合成

  1. 环境变量设置:
export OSSIAN=/root/workspace/Projects/Ossian
  1. 数据准备
cd $OSSIAN/data
# 下载数据
wget https://www.dropbox.com/s/uaz1ue2dked8fan/romanian_toy_demo_corpus_for_ossian.tar?dl=0
# 备用地址:https://cnbj1.fds.api.xiaomi.com/tts/Important_files/romanian_toy_demo_corpus_for_ossian.tar

# 解压
cd $OSSIAN/  ## voice will unpack relative to this location
tar xvf ./data/romanian_toy_demo_corpus_for_ossian.tar

产生的corpus文件夹目录结构如下:

corpus/
`-- rm
    |-- speakers
    |   `-- rss_toy_demo
    `-- text_corpora
        `-- wikipedia_10K_words

我们需要将刚下载的数据放到合适的文件夹内

  1. 确保将要使用的目录不存在
rm -r $OSSIAN/train/rm/speakers/rss_toy_demo/naive_01_nn/ $OSSIAN/voices/rm/rss_toy_demo/naive_01_nn/
  1. 开始训练
source ~/anaconda3/bin/activate python2
cd $OSSIAN
python ./scripts/train.py -s rss_toy_demo -l rm naive_01_nn

此步骤报错,报错信息为:

Cannot load NN model from model_dir: /root/workspace/Projects/Ossian/train//rm/speakers/rss_toy_demo/naive_01_nn/processors/duration_predictor -- not trained yet
Cannot load NN model from model_dir: /root/workspace/Projects/Ossian/train//rm/speakers/rss_toy_demo/naive_01_nn/processors/acoustic_predictor -- not trained yet
...
Step 1 in script /root/workspace/Projects/Ossian//scripts/acoustic_model_training/subrecipes/script/standard_alignment.sh failed, aborted!
...
set_up_data.py: No matching data files found in /root/workspace/Projects/Ossian/train//rm/speakers/rss_toy_demo/naive_01_nn/align_lab and /root/workspace/Projects/Ossian/train//rm/speakers/rss_toy_demo/naive_01_nn/cmp
Aligner training failed

下面是Debug过程

定位到程序出错行,发现是因为下面语句执行时出错:

/root/workspace/Projects/Ossian//scripts/acoustic_model_training/subrecipes/script/standard_alignment.sh /root/workspace/Projects/Ossian/train//rm/speakers/rss_toy_demo/naive_01_nn/cmp /root/workspace/Projects/Ossian/train//rm/speakers/rss_toy_demo/naive_01_nn/align_lab /root/workspace/Projects/Ossian//tools/bin/ /root/workspace/Projects/Ossian/train//rm/speakers/rss_toy_demo/naive_01_nn/processors/aligner/training /root/workspace/Projects/Ossian/train//rm/speakers/rss_toy_demo/naive_01_nn/processors/aligner/training/train.cfg | tee /root/workspace/Projects/Ossian/train//rm/speakers/rss_toy_demo/naive_01_nn/processors/aligner/training/log.txt                      | grep 'Aligner training'

单独执行,可再现错误。 进入standard_alignment.sh,可以发现引起失败的原因是执行下面语句时失败:

python $STEPS/set_up_data.py -labdir $LABDIR -cmpdir $CMPDIR -outdir $OUT/${STEPNUM} -bindir $BIN

即:

python /root/workspace/Projects/Ossian/scripts/acoustic_model_training/steps//set_up_data.py \
-labdir /root/workspace/Projects/Ossian/train//rm/speakers/rss_toy_demo/naive_01_nn/align_lab \
-cmpdir /root/workspace/Projects/Ossian/train//rm/speakers/rss_toy_demo/naive_01_nn/cmp \
-outdir /root/workspace/Projects/Ossian/train//rm/speakers/rss_toy_demo/naive_01_nn/processors/aligner/training/1 \
-bindir /root/workspace/Projects/Ossian//tools/bin/

进入此文件调试,可以发现报错的原因是intersect == [],而因此错误原因是opts.cmpdir文件夹下不存在*.cmp结尾的文件。

>: ll /root/workspace/Projects/Ossian/train//rm/speakers/rss_toy_demo/naive_01_nn/cmp/*.cmp
>: ls: cannot access '/root/workspace/Projects/Ossian/train//rm/speakers/rss_toy_demo/naive_01_nn/cmp/*.cmp': No such file or directory

因此我们需要弄清楚这些*.cmp文件为什么没有产生。

此问题已反馈到官方Github Issues页面,见:cmp files not genereted #1

最终在@oliverwatts 帮助下,解决了。问题是安装环境时HTK没有安装好,其用户名并不是邮箱,而是自己设置的一个。

注意,在此之前需要按照PR稍作修改代码: 去掉&

  1. 训练时长模型

仅仅使用CPU

cd $OSSIAN
export THEANO_FLAGS=""; python ./tools/merlin/src/run_merlin.py $OSSIAN/train/rm/speakers/rss_toy_demo/naive_01_nn/processors/duration_predictor/config.cfg

或者,也可以使用GPU

./scripts/util/submit.sh ./tools/merlin/src/run_merlin.py $OSSIAN/train/rm/speakers/rss_toy_demo/naive_01_nn/processors/duration_predictor/config.cfg
  1. 导出模型 将步骤5中训练好的模型导出成Ossian便于读取的格式:
python ./scripts/util/store_merlin_model.py $OSSIAN/train/rm/speakers/rss_toy_demo/naive_01_nn/processors/duration_predictor/config.cfg $OSSIAN/voices/rm/rss_toy_demo/naive_01_nn/processors/duration_predictor
  1. 训练声学模型

与时长模型训练类似:

cd $OSSIAN
export THEANO_FLAGS=""; python ./tools/merlin/src/run_merlin.py $OSSIAN/train/rm/speakers/rss_toy_demo/naive_01_nn/processors/acoustic_predictor/config.cfg

对应GPU运行方式:

./scripts/util/submit.sh ./tools/merlin/src/run_merlin.py $OSSIAN/train/rm/speakers/rss_toy_demo/naive_01_nn/processors/acoustic_predictor/config.cfg
  1. 导出模型 和步骤6类似:
python ./scripts/util/store_merlin_model.py $OSSIAN/train/rm/speakers/rss_toy_demo/naive_01_nn/processors/acoustic_predictor/config.cfg $OSSIAN/voices/rm/rss_toy_demo/naive_01_nn/processors/acoustic_predictor
  1. 测试:合成声音

使用训好的模型合成声音:

mkdir $OSSIAN/test/wav/
python ./scripts/speak.py -l rm -s rss_toy_demo -o ./test/wav/romanian_toy_HTS.wav naive_01_nn ./test/txt/romanian.txt

中文

按照数据格式,生成相应数据。类似上面的步骤,依次执行即可。

步骤如下:

source ~/anaconda3/bin/activate python2

export OSSIAN=/home/dl80/heyunchao/Programs/Ossian
export OSSIAN_LANG=cn
export DATA_NAME=cn_king
export RECIPE=naive_01_nn

# 清除历史目录,重新开始
rm -r $OSSIAN/train/$OSSIAN_LANG/speakers/$DATA_NAME/$RECIPE/  $OSSIAN/voices/$OSSIAN_LANG/$DATA_NAME/$RECIPE/

cd $OSSIAN
# Prepare config
python ./scripts/train.py -s $DATA_NAME -l $OSSIAN_LANG $RECIPE
# Train duration model
export THEANO_FLAGS=""; python ./tools/merlin/src/run_merlin.py $OSSIAN/train/$OSSIAN_LANG/speakers/$DATA_NAME/$RECIPE/processors/duration_predictor/config.cfg
# Export Merlin duration model
python ./scripts/util/store_merlin_model.py $OSSIAN/train/$OSSIAN_LANG/speakers/$DATA_NAME/$RECIPE/processors/duration_predictor/config.cfg $OSSIAN/voices/$OSSIAN_LANG/$DATA_NAME/$RECIPE/processors/duration_predictor
# Train acoustic model
export THEANO_FLAGS=""; python ./tools/merlin/src/run_merlin.py $OSSIAN/train/$OSSIAN_LANG/speakers/$DATA_NAME/$RECIPE/processors/acoustic_predictor/config.cfg
# Export Merlin acoustic model
python ./scripts/util/store_merlin_model.py $OSSIAN/train/$OSSIAN_LANG/speakers/$DATA_NAME/$RECIPE/processors/acoustic_predictor/config.cfg $OSSIAN/voices/$OSSIAN_LANG/$DATA_NAME/$RECIPE/processors/acoustic_predictor
# Test
mkdir -p $OSSIAN/test/wav/
python ./scripts/speak.py -l $OSSIAN_LANG -s $DATA_NAME -o ./test/wav/${OSSIAN_LANG}_${DATA_NAME}_test.wav $RECIPE ./test/txt/test.txt

注意: 准备好训练数据后,可查看文本前端处理后,格式是否正确,路径如下:

cd $OSSIAN/train/$OSSIAN_LANG/speakers/$DATA_NAME/$RECIPE/utt/
# 这里面的utt文件,即为前端处理结构
vim <filename.utt>

编辑配置文件【可选】

在真正训练时,默认配置文件模型较为简单,如果想更改模型结构、超参数等,可进行如下操作:

# 修改
vim train/$OSSIAN_LANG/speakers/$DATA_NAME/$RECIPE/processors/duration_predictor/config.cfg
vim train/$OSSIAN_LANG/speakers/$DATA_NAME/$RECIPE/processors/acoustic_predictor/config.cfg
# 覆盖
cp train/$OSSIAN_LANG/speakers/$DATA_NAME/$RECIPE/processors/duration_predictor/config.cfg voices/$OSSIAN_LANG/$DATA_NAME/$RECIPE/processors/duration_predictor/config.cfg
cp train/$OSSIAN_LANG/speakers/$DATA_NAME/$RECIPE/processors/acoustic_predictor/config.cfg voices/$OSSIAN_LANG/$DATA_NAME/$RECIPE/processors/acoustic_predictor/config.cfg

常见问题

  1. 如果出现文件不匹配,或不能打开某个文件之类问题,首先尝试删除trainvoices文件夹中的相应语言文件夹试试。
  2. 训练duration model过程中,出现的WARNING: no silence found!,可以直接忽略,不影响。
  3. 使用GPU训练时可以先在python terminal中试试import theano,看看能否出现类似Using gpu device 0: Tesla K80信息,如果没有出现,说明theano没有配置好。或者,在训练时出现了如下信息,也说明theano使用GPU没有配置好:
ERROR (theano.sandbox.cuda): nvcc compiler not found on $PATH. Check your nvcc installation and try again.

配置theano方法如下:

第一步,设置PATH

export PATH="/usr/local/cuda/bin/:$PATH"
source ~/.bashrc

第二步,修改配置文件:

# 创建配置配置文件
vim ~/.theanorc
# 此文件内容如下
[global]
floatX = float32
device = gpu0

[cuda] 
root = /usr/local/cuda      # 依据实际位置而定

第三步,检查是否成功:

nvcc --version      # 正常显示
import theano       # 正常显示GPU
  1. 如果在测试的时候出现了如下问题:
Traceback (most recent call last):
  File "./scripts/speak.py", line 181, in <module>
    main_work()
  File "./scripts/speak.py", line 85, in main_work
    voice = Voice(opts.speaker, opts.lang, opts.config, opts.stage, dirs)
  File "/home/dl80/heyunchao/Programs/Ossian/scripts/main/Voice.py", line 93, in __init__
    execfile(load_from_file, self.config)
  File "/home/dl80/heyunchao/Programs/Ossian/voices//rm/rss_toy_demo/naive_01_nn/voice.cfg", line 17, in <module>
    from Tokenisers import RegexTokeniser
ImportError: No module named Tokenisers

这是由于软连接所造成的,程序中通过:

os.path.realpath(os.path.abspath(os.path.dirname(inspect.getfile(inspect.currentframe()))))

来获得文件真实路径,会跳过软连接,而得到原始地址;需要的是软连接地址。

或许最简单的方式是,在真实voices所在路径,即voices软连接指向路径,上建立指向Ossian的软连接,具体如下所示:

现状:

  1. 为节省空间我们的存在voices -> /home/dl54/heyunchao/workspace/Ossian_voice/软连接;

  2. Ossian程序文件真实路径为/home/dl80/heyunchao/Programs/Ossian

那么:

cd /home/dl54/heyunchao/workspace/Ossian_voice/$OSSIAN_LANG/$DATA_NAME
ln -s /home/dl80/heyunchao/Programs/Ossian/scripts/ .

解决方法二:【改代码】

/home/dl80/heyunchao/Programs/Ossian/voices//$OSSIAN_LANG/$DATA_NAME/$RECIPE/voice.cfg中的:

current_dir = os.path.realpath(os.path.abspath(os.path.dirname(inspect.getfile(inspect.currentframe()))))

更改为:

current_dir = os.path.realpath(os.path.abspath(os.path.dirname(sys.argv[0])))

其他命令

同时合成多句并上传致FDS:

i=0
ls ./test/txt/cn_* | while read line; do python ./scripts/speak.py -l $OSSIAN_LANG -s $DATA_NAME -o ./test/wav/${OSSIAN_LANG}_${DATA_NAME}_${i}.wav $RECIPE ${line}; i=$[i+1]; done
ls cn_cn_king_* | while read line; do fds -m put -b tts -o $line -d $line -e cnbj1.fds.api.xiaomi.com & done

Ossian进阶

前文<1_Ossian初探.md>我们介绍了用Ossian训练简单的合成模型。本文,我们将使用更大的数据集,训练中文合成模型。并(尽量)解决过程中遇到的问题,同时,优化和简化训练过程。

Segmentation fault

首先,在执行python ./scripts/train.py -s $DATA_NAME -l $OSSIAN_LANG $RECIPE时,到达步骤5:acoustic_feature_extractor时,对于部分语音数据,会报段错误的问题(不影响程序继续执行)。部分段错误音频数据为: 000926.wav 003105.wav 003625.wav 003770.wav 004215.wav 004260.wav 005435.wav 005902.wav 。具体执行的命令为:

/home/dl80/heyunchao/Programs/Ossian//tools/bin//analysis /home/dl80/heyunchao/Programs/Ossian/train//cn/speakers/cn_king/naive_01_nn/cmp/005902.wav /home/dl80/heyunchao/Programs/Ossian/train//cn/speakers/cn_king/naive_01_nn/cmp/005902.f0.double /home/dl80/heyunchao/Programs/Ossian/train//cn/speakers/cn_king/naive_01_nn/cmp/005902.sp.double /home/dl80/heyunchao/Programs/Ossian/train//cn/speakers/cn_king/naive_01_nn/cmp/005902.bap.double &> /home/dl80/heyunchao/Programs/Ossian/train//cn/speakers/cn_king/naive_01_nn/cmp/005902.log

analysis程序来源于World,不过Zhizheng Wu对此做过一些修改,修改后见Merlin Repo。其使用方法如下:

test.exe input.wav outout.wav f0 spec flag
input.wav  : argv[1] Input file
output.wav : argv[2] sp file
f0         : argv[3] ap file
spec       : argv[4] f0 file

中间文件

由于步骤4需要花费大量时间,之后步骤会依赖此步执行结果,我们希望弄清楚步骤4产生的结果是什么,以便重新训练时,可以重新进行之后步骤,而重用步骤4的结果。步骤4产生了两个文件夹:train和voices。

train文件夹内容如下:(前5层级)

>: tree train -L 5 -F

train/
└── cn/
    └── speakers/
        └── cn_king/
            └── naive_01_nn/
                ├── align_lab/
                ├── align_log/
                ├── cmp/
                ├── dur/
                ├── lab_dnn/
                ├── lab_dur/
                ├── null
                ├── null.cont
                ├── null.key
                ├── null.values
                ├── processors/
                ├── questions_dnn.hed
                ├── questions_dnn.hed.cont
                ├── questions_dnn.hed.key
                ├── questions_dnn.hed.values
                ├── questions_dur.hed
                ├── questions_dur.hed.cont
                ├── questions_dur.hed.key
                ├── questions_dur.hed.values
                ├── SomeFileName
                ├── SomeFileName.cont
                ├── SomeFileName.key
                ├── SomeFileName.values
                ├── time_lab/
                └── utt/

只显示train目录中的文件夹:

>: tree train/ -d

train/
└── cn
    └── speakers
        └── cn_king
            └── naive_01_nn
                ├── align_lab
                ├── align_log
                ├── cmp
                ├── dur
                ├── lab_dnn
                ├── lab_dur
                ├── processors
                │   ├── acoustic_feature_extractor
                │   │   └── training
                │   ├── acoustic_predictor
                │   ├── aligner
                │   │   └── training
                │   │       ├── 1
                │   │       │   └── data
                │   │       ├── 10
                │   │       │   ├── config
                │   │       │   └── data
                │   │       ├── 11
                │   │       │   ├── config
                │   │       │   └── data
                │   │       ├── 2
                │   │       │   ├── config
                │   │       │   ├── data
                │   │       │   └── hcompv
                │   │       ├── 3
                │   │       │   ├── config
                │   │       │   └── data
                │   │       ├── 4
                │   │       │   ├── config
                │   │       │   └── data
                │   │       ├── 5
                │   │       │   ├── config
                │   │       │   └── data
                │   │       ├── 6
                │   │       │   ├── config
                │   │       │   └── data
                │   │       ├── 7
                │   │       │   ├── config
                │   │       │   └── data
                │   │       ├── 8
                │   │       │   ├── config
                │   │       │   └── data
                │   │       ├── 9
                │   │       │   ├── config
                │   │       │   └── data
                │   │       └── final_model
                │   │           ├── config
                │   │           └── data
                │   ├── duration_predictor
                │   ├── pause_predictor
                │   └── word_vector_tagger
                │       └── training
                ├── time_lab
                └── utt

voices文件夹内容如下:(全部内容)

>: tree voices

voices
└── cn
    └── cn_king
        └── naive_01_nn
            ├── output
            ├── processors
            │   ├── acoustic_feature_extractor
            │   │   └── acoustic_feats.cfg
            │   ├── acoustic_predictor
            │   │   ├── config.cfg
            │   │   └── filelist.txt
            │   ├── aligner
            │   │   ├── cmp.mmf
            │   │   ├── extra_substitutions.txt
            │   │   ├── general.conf
            │   │   ├── lexicon.txt
            │   │   └── modellist.mono
            │   ├── duration_predictor
            │   │   ├── config.cfg
            │   │   └── filelist.txt
            │   ├── pause_predictor
            │   │   ├── model.pkl
            │   │   └── model.pkl.dot
            │   └── word_vector_tagger
            │       └── table_file.table
            └── voice.cfg

Ossian代码及执行命令分析

文中我们假设环境变量已经设置好,例如:

source ~/anaconda3/bin/activate python2

export OSSIAN=/home/dl80/heyunchao/Programs/Ossian
export OSSIAN_LANG=cn
export DATA_NAME=cn_king
export RECIPE=naive_01_nn

前端

文本处理前端对应的命令是:

python ./scripts/train.py -s $DATA_NAME -l $OSSIAN_LANG $RECIPE

作用是准备配置文件,以及准备训练时长模型和声学模型所需的数据。

调用格式如下:

usage: train.py [-h] -s SPEAKER -l LANG [-t STAGE] [-c] [-profile]
                [-text TEXT_CORPUS_NAME] [-d COMMAND_LINE_CORPUS]
                [-p MAX_CORES] [-bin CUSTOM_BINDIR]
                config

positional arguments:
  config                configuration to use: naive, semi-naive, gold, as
                        defined in <ROOT>/recipes/<config> -directory

optional arguments:
  -h, --help            show this help message and exit
  -s SPEAKER            the name of the speaker:
                        <ROOT>/corpus/<LANG>/<SPEAKER>
  -l LANG               the language of the speaker: <ROOT>/corpus/<LANG>
  -t STAGE              defines the current usage stage (definitions of stages
                        should by found in <config>/recipe.cfg
  -c                    clear any previous training data first
  -profile
  -text TEXT_CORPUS_NAME
                        name of text corpus to be used for tool training, uses
                        only voice prompts if not specified
  -d COMMAND_LINE_CORPUS
                        directories in arbitrary location containing training
                        data
  -p MAX_CORES          maximum number of CPU cores to use in parallel
  -bin CUSTOM_BINDIR

-text参数可以指定额外的文本数据,默认只是用抄本,使用更多额外的数据在训练词向量是有帮助。

代码分析

train.py开始,我们看看其调用了哪些类。

Corpus类

所有音频、抄本、未对应音频文本(可能存在)放在以list形式在变量voice_data中(train.py:#L114),用来初始化构造Corpus类: 代码

Corpus类结构如下:

image

代码中对此类说明:

All files in filelist must exist and end in .txt or .wav. No specific order
is required in the list -- .txt and .wav files are paired based on their
names (i.e. /some-path/utt1.txt is paired with /some-other-path/utt1.wav).
Unpaired txt files are used as unannotated text data.

可以发现:

  • 文件需要以.txt.wav结尾;

  • filelist中的文件可以无序,通过文件名配对。

成员变量utterances类型未dict,其存储的数据类似格式如下:

{'004065': {'text': '/home/dl80/heyunchao/Programs/Ossian/corpus/cn/speakers/cn_king_hanzi/txt/004065.txt',
           'speech': '/home/dl80/heyunchao/Programs/Ossian/corpus/cn/speakers/cn_king_hanzi/txt/004065.wav'}, '004507': {'text': '/home/dl80/heyunchao/Programs/Ossian/corpus/cn/speakers/cn_king_hanzi/txt/004507.txt',
           'speech': '/home/dl80/heyunchao/Programs/Ossian/corpus/cn/speakers/cn_king_hanzi/txt/004065.wav'}, '007743': {'text': '/home/dl80/heyunchao/Programs/Ossian/corpus/cn/speakers/cn_king_hanzi/txt/007743.txt'},
...}

Voice类

#117传入一些配置参数和目录等信息,构造了Voice类。其类结构如下:

image

通过执行voice.train(),就准备好了所有数据。

max_cores默认使用全部CPU。

通过判断voice_config_file文件是否存在,以及是否需要clear_old_data,来判断是否已经trained:

如果存在,那么从从voice_config_file加载,不存在,则加载recipe_file

NOTE: recipe_file其实是一个可执行的Python脚本。程序中通过execfile方式执行。

## 在我们的程序中这几个文件对应的路径如下
# self.voice_config_file
/home/dl80/heyunchao/Programs/Ossian/voices//cn/cn_king_hanzi/naive_01_nn/voice.cfg
# self.recipe_file
/home/dl80/heyunchao/Programs/Ossian/recipes/naive_01_nn.cfg

这两个文件内容一摸一样。其目的是产生train_stagesruntime_stages两个变量。定义了前端处理不同子流程。每个子流程为一个类,这些类有一些共同的成员函数和成员变量,例如:language, trained, processor_name, verify(), reuse_component(), train(), apply_to_utt(), parallelisable等。

最后,执行voice类的train()方法,将调用每个前端子模块的train()方法,进行执行。

在Voice类中还调用了别的类。这些类简单介绍如下:

Resources
self.res = Resources(speaker=speaker, language=language, configuration=configuration, DIRS=DIRS)
manage external resources of the voice

The resources types can be files,  flags, values, objects?
and are stored in dictionary format

-should provide abstraction regarding directories and filename extensions
in loading and saving resources

- processors should be able to add and query resources by name

下文我们将详细介绍各个前端子模块执行方式。

RECIPE-前端子模块介绍

位于Ossian/recipes/路径下的recipe,其实为Python脚本,在Voice.py中通过execfile方式执行,直接完成后,全局变量全部返回。

下文我们以naive_01_nn.cfg这个recipe为例,说明这些文件运行方式。

由于程序recipe程序需要使用:

from Tokenisers import RegexTokeniser
from Phonetisers import NaivePhonetiser
from VSMTagger import VSMTagger
from FeatureExtractor import WorldExtractor
from FeatureDumper import FeatureDumper
from Aligner import StateAligner
from SKLProcessors import SKLDecisionTreePausePredictor 
from PhraseMaker import PhraseMaker
from AcousticModel import AcousticModelWorld
from NN import NNDurationPredictor, NNAcousticPredictor

import default.const as c

因此,首先,需要将这些模块所在路径增加到sys.path中。

Tokeniser

分词功能,word_splitter

类图如下:

image

点击看大图

作用:使用正则表达式进行Tokenize。

声学模型训练

声学模型训练执行命令:

export THEANO_FLAGS=""; python ./tools/merlin/src/run_merlin.py $OSSIAN/train/$OSSIAN_LANG/speakers/$DATA_NAME/$RECIPE/processors/acoustic_predictor/config.cfg

假设:

source ~/anaconda3/bin/activate python2

export OSSIAN=/home/dl80/heyunchao/Programs/Ossian
export OSSIAN_LANG=cn
export DATA_NAME=cn_king
export RECIPE=naive_01_nn

并且,我们假设已经执行了前端数据准备过程。

从命令执行可看出,这一部分使用的是Merlin代码,而非Ossian本身内容。Ossian目前代码仅支持Merlin 8aed278提交,之后版本是否兼容并未测试。

cd $OSSIAN/tools/
git clone https://github.com/CSTR-Edinburgh/merlin.git
cd merlin
## reset to this specific version, which I have tested, must check later versions:--
git reset --hard 8aed278  

run_merlin.py执行方式如下:

usage: run_merlin.sh [config file name]

本例中的config.cfg前几行如下:

OSSIAN: /home/dl80/heyunchao/Programs/Ossian
LANGUAGE: cn
SPEAKER: cn_king_hanzi
RECIPE: naive_01_nn


## This line should point to the language/data/recipe combination you are working on:
TOP: %(OSSIAN)s/train/%(LANGUAGE)s/speakers/%(SPEAKER)s/%(RECIPE)s/

## spot for putting things in training -- not the final stored model:
WORKDIR: %(TOP)s/dnn_training_ACOUST/
DATADIR: %(TOP)s/cmp/

[Paths]

work: %(WORKDIR)s/
data: %(DATADIR)s/

这个文件是在数据准备阶段产生,在训练模型时用来指定训练数据、模型结构参数等。

花费大量时间在程序的label_normaliser.perform_normalisation(in_label_align_file_list, binary_label_file_list, label_type=cfg.label_type),第554行。这一步的作用:使用标注HTS风格的标签方式,准备标签数据。

准备标签数据需要从文件中读取每一个文件,然后计算后再保存到另一个文件。以000001为例,读取过程是从文件/home/dl80/heyunchao/Programs/Ossian/train/cn/speakers/cn_king_hanzi/naive_01_nn//lab_dnn/000001.lab_dnn读取,提取好标签后,保存到了/home/dl80/heyunchao/Programs/Ossian/train/cn/speakers/cn_king_hanzi/naive_01_nn//cmp//binary_label_16708/000001.lab_dnn中。

那么可不可以不要每次都进行这个操作呢?

@superhg2012
Copy link

你好,能介绍下用到的中文合成使用的语料cn_king吗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment