Skip to content

Instantly share code, notes, and snippets.

@wasertech
Last active March 24, 2023 22:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save wasertech/2228796276fb3d2911cbbe55dac1e23d to your computer and use it in GitHub Desktop.
Save wasertech/2228796276fb3d2911cbbe55dac1e23d to your computer and use it in GitHub Desktop.
diff --recursive DeepSpeech STT
#❯ diff --recursive DeepSpeech STT
Seulement dans STT: augments.txt
diff '--color=auto' --recursive DeepSpeech/build_lm.sh STT/build_lm.sh
19c19
< pushd $HOME/ds/
---
> pushd ${STT_DIR}
26c26
< --kenlm_bins $HOME/kenlm/build/bin/ \
---
> --kenlm_bins ${HOMEDIR}/kenlm/build/bin/ \
36c36
< --alphabet /mnt/models/alphabet.txt \
---
> --checkpoint /mnt/models/ \
diff '--color=auto' --recursive DeepSpeech/checks.sh STT/checks.sh
33,34c33,34
< pushd $HOME/ds/
< ./bin/run-tc-ldc93s1_new.sh 2 16000
---
> pushd ${STT_DIR}
> ./bin/run-ci-ldc93s1_new.sh 2 16000
diff '--color=auto' --recursive DeepSpeech/CONTRIBUTING.md STT/CONTRIBUTING.md
9,11c9,11
< * Ensure you have a running setup of `NVIDIA Docker`
< * Prepare a host directory with enough space for training / producing intermediate data (100GB ?).
< * Ensure it's writable by `trainer` (uid 999) user (defined in the Dockerfile).
---
> * Ensure you have a running setup of [`Docker` working with GPU support](https://docs.docker.com/config/containers/resource_constraints/#gpu)
> * Prepare a host directory with enough space for training / producing intermediate data (>=400GB).
> * Ensure it's writable by `trainer` (uid 999 by default) user (defined in the Dockerfile).
13c13
< Place `cv-4-fr.tar.gz` inside your host directory, in a `sources/` subdirectory.
---
> Place `cv-corpus-*-fr` inside your host directory, in a `sources/` subdirectory.
18c18
< $ docker build -f Dockerfile.train .
---
> $ docker build [--build-arg ARG=val] -f Dockerfile.train -t commonvoice-fr .
22,24c22,24
< - `ds_repo` to fetch DeepSpeech from a different repo than upstream
< - `ds_branch` to checkout a specific branch / commit
< - `ds_sha1` commit to pull from when installing pre-built binaries
---
> - `stt_repo` to fetch STT from a different repo than upstream
> - `stt_branch` to checkout a specific branch / commit
> - `stt_sha1` commit to pull from when installing pre-built binaries
30,32c30,33
< - lm_evaluate_range, if non empty, this will perform a LM alpha/beta evaluation
< the parameter is expected to be of the form: lm_alpha_max,lm_beta_max,n_trials.
< See upstream lm_optimizer.py for details
---
> - `lm_evaluate_range`, if non empty, this will perform a LM alpha/beta evaluation
> the parameter is expected to be of the form: `lm_alpha_max`,`lm_beta_max`,`n_trials`.
> See upstream `lm_optimizer.py` for details
> - `lm_add_excluded_max_sec` set to 1 adds excluded sentences that were too long to the language model.
35c36,38
< - `batch_size` to specify the batch size for training, dev and test dataset
---
> - `train_batch_size` to specify the batch size for training dataset
> - `dev_batch_size` to specify the batch size for dev dataset
> - `test_batch_size` to specify the batch size for test dataset
40a44
> - `skip_batch_test` to skip or not batch test completely
43a48,52
> - `enable_augments` to help the model to better generalise on noisy data by augmenting the data in various ways.
> - `augmentation_arguments` to set `augments_file` path to give augemntation parameters.
> - `augments.txt`: `augments_file` containing arguments to use for data argumentation if `enable_augments` is set to 1.
> - `cv_personal_first_url` to download only your own voice instead of all Common Voice dataset (first url).
> - `cv_personal_second_url` to download only your own voice instead of all Common Voice dataset (second url).
53a63,65
> Miscellaneous parameters:
> - `use_tf_random_gen_state`: set to 0 if your GPU doesn't need to use Tensorflow's CuDNN random state generation to train.
>
64c76
< - Common Voice French, released on june 2020
---
> - Common Voice French, released on april 2022 (v9.0)
67a80
> - OpenSLR 94: Att-HACK
68a82
> - MLS French dataset
70c84
< ### Transfer learning from English
---
> ### Transfer learning from pre-trained checkpoints
77,78c91,93
< `type=bind,src=PATH/TO/CHECKPOINTS,dst=/transfer-checkpoint`. Upon running, data
< will be copied from that place.
---
> `type=bind,src=PATH/TO/CHECKPOINTS,dst=/transfer-checkpoint`. Upon running, the checkpoints will be automatically used as starting point.
>
> Checkpoints don't typically use automatic mixed precision nor fully-connected layer normalization and mostly use a standard number of hidden layers (2048 unless specified otherwise). So don't change those parameters to fine-tune from them.
83,86c98,108
< - Threadripper 3950X + 128GB RAM
< - 2x RTX 2080 Ti
< - Debian Sid, kernel 5.7, driver 440.100
< - With ~1000h of audio, one training epoch takes ~23min (Automatic Mixed Precision enabled)
---
>
> > - Threadripper 3950X + 128GB RAM
> > - 2x RTX 2080 Ti
> > - Debian Sid, kernel 5.7, driver 440.100
>
> > - Threadripper 2920X + 96GB RAM
> > - 2x Titan RTX
> > - Manjaro (Arch) Linux, kernel 5.15.32-1-MANJARO, driver 510.60.02
>
>
> With ~1000h of audio, one training epoch takes ~23min (Automatic Mixed Precision enabled)
94c116
< $ docker run --tty --runtime=nvidia --mount type=bind,src=PATH/TO/HOST/DIRECTORY,dst=/mnt <docker-image-id>
---
> $ docker run --it --gpus=all --mount type=bind,src=PATH/TO/HOST/DIRECTORY,dst=/mnt --env TRAIN_BATCH_SIZE=64 commonvoice-fr
diff '--color=auto' --recursive DeepSpeech/Dockerfile.train STT/Dockerfile.train
1c1
< FROM nvidia/cuda:10.0-cudnn7-runtime-ubuntu18.04
---
> FROM nvcr.io/nvidia/tensorflow:22.02-tf1-py3
3,5c3,5
< ARG ds_repo=mozilla/DeepSpeech
< ARG ds_branch=4270e22fe02f4fa7430a721ac917f6353c36f455
< ARG ds_sha1=4270e22fe02f4fa7430a721ac917f6353c36f455
---
> ARG stt_repo=coqui-ai/STT
> ARG stt_branch=fcec06bdd89f6ae68e2599495e8471da5e5ba45e
> ARG stt_sha1=fcec06bdd89f6ae68e2599495e8471da5e5ba45e
16,17c16,23
< ARG batch_size=64
< ENV BATCH_SIZE=$batch_size
---
> ARG train_batch_size=64
> ENV TRAIN_BATCH_SIZE=$train_batch_size
>
> ARG dev_batch_size=64
> ENV DEV_BATCH_SIZE=$dev_batch_size
>
> ARG test_batch_size=64
> ENV TEST_BATCH_SIZE=$test_batch_size
22c28
< ARG epochs=30
---
> ARG epochs=40
48a55,60
> # Skipping batch test to avoid hanging processes
> # Should be set to 0 by default once STT#2195 is fixed
> # See https://github.com/coqui-ai/STT/issues/2195 for more details
> ARG skip_batch_test=1
> ENV SKIP_BATCH_TEST=$skip_batch_test
>
56a69,75
> # Data augmentation
> ARG enable_augments=0
> ENV ENABLE_AUGMENTS=$enable_augments
>
> ARG augmentation_arguments="augments.txt"
> ENV AUGMENTATION_ARGUMENTS=$augmentation_arguments
>
60a80,92
> ARG lm_add_excluded_max_sec=0
> ENV LM_ADD_EXCLUDED_MAX_SEC=$lm_add_excluded_max_sec
>
> # To fine-tune using your own data
> ARG cv_personal_first_url=
> ENV CV_PERSONAL_FIRST_URL=$cv_personal_first_url
>
> ARG cv_personal_second_url=
> ENV CV_PERSONAL_SECOND_URL=$cv_personal_second_url
>
> ARG log_level=1
> ENV LOG_LEVEL=$log_level
>
66a99,104
> # Configure random state
> # Required for trainig on newer GPUs such as series 30/40.
> # You can safely disable it (set to 0) if your GPU doesn't need it.
> ARG use_tf_random_gen_state=1
> ENV TF_CUDNN_RESET_RND_GEN_STATE=$use_tf_random_gen_state
>
75c113
< ENV VIRTUAL_ENV_NAME ds-train
---
> ENV VIRTUAL_ENV_NAME stt-train
77c115
< ENV DS_DIR $HOMEDIR/ds
---
> ENV STT_DIR $HOMEDIR/stt
80,81c118,119
< ENV DS_BRANCH=$ds_branch
< ENV DS_SHA1=$ds_sha1
---
> ENV STT_BRANCH=$stt_branch
> ENV STT_SHA1=$stt_sha1
83c121
< ENV PATH="$VIRTUAL_ENV/bin:$PATH"
---
> ENV PATH="$VIRTUAL_ENV/bin:${HOMEDIR}/tf-venv/bin:$PATH"
93d130
< ffmpeg \
101a139,143
> libmagic-dev \
> libopus0 \
> libopusfile0 \
> libsndfile1 \
> libeigen3-dev \
104c146
< virtualenv \
---
> python3-venv \
109a152
> ffmpeg \
111c154,162
< xz-utils
---
> xz-utils \
> software-properties-common
>
> # For exporting using TFLite
> RUN add-apt-repository ppa:deadsnakes/ppa -y
>
> RUN apt-get -qq update && apt-get -qq install -y --no-install-recommends \
> python3.7 \
> python3.7-venv
124,126c175
< RUN wget -O - https://gitlab.com/libeigen/eigen/-/archive/3.2.8/eigen-3.2.8.tar.bz2 | tar xj
<
< RUN git clone https://github.com/$kenlm_repo.git && cd kenlm && git checkout $kenlm_branch \
---
> RUN git clone https://github.com/$kenlm_repo.git ${HOMEDIR}/kenlm && cd ${HOMEDIR}/kenlm && git checkout $kenlm_branch \
129c178
< && EIGEN3_ROOT=$HOMEDIR/eigen-eigen-07105f7124f9 cmake .. \
---
> && cmake .. \
134c183,186
< RUN virtualenv --python=/usr/bin/python3 $VIRTUAL_ENV_NAME
---
> RUN python3 -m venv --system-site-packages $VIRTUAL_ENV_NAME
>
> # Venv for upstream tensorflow with tflite api
> RUN python3.7 -m venv ${HOME}/tf-venv
138c190,194
< RUN git clone https://github.com/$ds_repo.git $DS_DIR
---
> RUN git clone https://github.com/$stt_repo.git $STT_DIR
>
> WORKDIR $STT_DIR
>
> RUN git checkout $stt_branch
140c196
< WORKDIR $DS_DIR
---
> WORKDIR $STT_DIR
142c198
< RUN git checkout $ds_branch
---
> RUN pip install --upgrade pip wheel setuptools
144c200,202
< WORKDIR $DS_DIR
---
> # Build CTC decoder first, to avoid clashes on incompatible versions upgrades
> RUN cd native_client/ctcdecode && make NUM_PROCESSES=$(nproc) bindings
> RUN pip install --upgrade native_client/ctcdecode/dist/*.whl
146,148c204,208
< RUN pip install --upgrade pip==20.0.2 wheel==0.34.2 setuptools==46.1.3
< RUN DS_NOTENSORFLOW=y pip install --upgrade --force-reinstall -e .
< RUN pip install --upgrade tensorflow-gpu==1.15.4
---
> # Install STT
> # No need for the decoder since we did it earlier
> # TensorFlow GPU should already be installed on the base image,
> # and we don't want to break that
> RUN DS_NODECODER=y DS_NOTENSORFLOW=y pip install --upgrade --force-reinstall -e .
150,153c210,211
< RUN TASKCLUSTER_SCHEME="https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.tensorflow.pip.%(branch_name)s.%(arch_string)s/artifacts/public/%(artifact_name)s" python util/taskcluster.py \
< --target="$(pwd)" \
< --artifact="convert_graphdef_memmapped_format" \
< --branch="r1.15" && chmod +x convert_graphdef_memmapped_format
---
> # Install coqui_stt_training (inside tf-venv) for exporting models using tflite
> RUN ${HOME}/tf-venv/bin/pip install -e .
155,157c213,215
< RUN python util/taskcluster.py \
< --target="$(pwd)" \
< --artifact="native_client.tar.xz" && ls -hal generate_scorer_package
---
> # Pre-built native client tools
> RUN LATEST_STABLE_RELEASE=$(curl "https://api.github.com/repos/coqui-ai/STT/releases/latest" | python -c 'import sys; import json; print(json.load(sys.stdin)["tag_name"])') \
> bash -c 'curl -L https://github.com/coqui-ai/STT/releases/download/${LATEST_STABLE_RELEASE}/native_client.tflite.Linux.tar.xz | tar -xJvf -' && ls -hal generate_scorer_package
175c233,234
< RUN pip install parso==0.8.1
---
> # modin has this wierd strict but implicit dependency: swifter<1.1.0
> RUN pip install parso==0.8.3 'swifter<1.1.0'
182c241,247
< RUN pip install num2words
---
> RUN pip install num2words zipfile38
>
> # Fix numpy and pandas version
> RUN python -m pip install 'numpy<1.19.0,>=1.16.0' 'pandas<1.4.0dev0,>=1.0'
>
> # Use yaml in bash to get best lm alpha and beta from opt for export
> RUN python -m pip install shyaml
186c251
< ENV PATH="$HOMEDIR/kenlm/build/bin/:$PATH"
---
> ENV PATH="${HOMEDIR}/kenlm/build/bin/:$PATH"
diff '--color=auto' --recursive DeepSpeech/evaluate_lm.sh STT/evaluate_lm.sh
5,6c5,6
< pushd $HOME/ds/
< all_test_csv="$(find /mnt/extracted/data/ -type f -name '*test.csv' -printf '%p,' | sed -e 's/,$//g')"
---
> pushd ${STT_DIR}
> all_test_csv="$(find /mnt/extracted/data/ -type f -name '*test.csv' -printf '%p ' | sed -e 's/ $//g')"
13c13
< if [ ! -z "${LM_EVALUATE_RANGE}" ]; then
---
> if [ ! -z "${LM_EVALUATE_RANGE}" -a ! -f '/mnt/lm/opt_lm.yml' ]; then
18,20c18,20
< python -u lm_optimizer.py \
< --show_progressbar True \
< --train_cudnn True \
---
> python -u ${HOME}/lm_optimizer.py \
> --show_progressbar true \
> --train_cudnn true \
25c25
< --test_batch_size ${BATCH_SIZE} \
---
> --test_batch_size ${TEST_BATCH_SIZE} \
Seulement dans STT: export.sh
Seulement dans STT/fr: import_atthack.sh
diff '--color=auto' --recursive DeepSpeech/fr/import_ccpmf.sh STT/fr/import_ccpmf.sh
5c5
< pushd $HOME/ds/
---
> pushd ${STT_DIR}
6a7,11
>
> if [ "${LM_ADD_EXCLUDED_MAX_SEC}" = "1" ]; then
> SAVE_EXCLUDED_MAX_SEC="--save_excluded_max_sec_to_disk /mnt/extracted/data/ccpmf/ccpmf_excluded_lm.txt"
> fi;
>
11a17
> ${SAVE_EXCLUDED_MAX_SEC} \
12a19,22
>
> if [ "${LM_ADD_EXCLUDED_MAX_SEC}" = "1" ]; then
> mv /mnt/extracted/data/ccpmf/ccpmf_excluded_lm.txt /mnt/extracted/_ccpmf_lm.txt
> fi;
diff '--color=auto' --recursive DeepSpeech/fr/importers.sh STT/fr/importers.sh
5c5,17
< ../import_cv.sh
---
> # If the environment contains urls to download a CV personal archive of the user
> # and there is a checkpoint mounted but no output_graph,
> # it's likely we want to downlaod our personal archive as data
> # and start fine-tuning from our checkpoint.
> if [ \
> -f "/transfer-checkpoint/checkpoint" -a \
> ! -f "/mnt/models/output_graph.tflite" -a \
> ! -z "${CV_PERSONAL_FIRST_URL}" -a \
> ! -z "${CV_PERSONAL_SECOND_URL}" \
> ]; then
> ../import_cv_perso.sh
> else
> ../import_cv.sh
7c19
< ../import_lingualibre.sh
---
> ../import_lingualibre.sh
9c21
< import_trainingspeech.sh
---
> import_trainingspeech.sh
11c23
< import_slr57.sh
---
> import_slr57.sh
13c25
< ../import_m-ailabs.sh
---
> ../import_m-ailabs.sh
15c27,32
< import_ccpmf.sh
---
> ./import_atthack.sh
>
> ./import_mls.sh
>
> #./import_ccpmf.sh
> fi;
Seulement dans STT/fr: import_mls.sh
diff '--color=auto' --recursive DeepSpeech/fr/import_slr57.sh STT/fr/import_slr57.sh
5c5
< pushd $HOME/ds/
---
> pushd ${STT_DIR}
10a11,15
>
> if [ "${LM_ADD_EXCLUDED_MAX_SEC}" = "1" ]; then
> SAVE_EXCLUDED_MAX_SEC="--save_excluded_max_sec_to_disk /mnt/extracted/data/African_Accented_French/African_Accented_French_excluded_lm.txt"
> fi;
>
13a19
> ${SAVE_EXCLUDED_MAX_SEC} \
14a21,24
>
> if [ "${LM_ADD_EXCLUDED_MAX_SEC}" = "1" ]; then
> mv /mnt/extracted/data/African_Accented_French/African_Accented_French_excluded_lm.txt /mnt/extracted/_slr57_lm.txt
> fi;
diff '--color=auto' --recursive DeepSpeech/fr/import_trainingspeech.sh STT/fr/import_trainingspeech.sh
5,6c5
< pushd $HOME/ds/
< pip install Unidecode==1.0.23
---
> pushd ${STT_DIR}
diff '--color=auto' --recursive DeepSpeech/fr/metadata.sh STT/fr/metadata.sh
5,7c5,7
< export METADATA_AUTHOR="DeepSpeech-FR-Team"
< export METADATA_MODEL_NAME="deepspeech-fr"
< export METADATA_MODEL_VERSION="0.6"
---
> export METADATA_AUTHOR="CommonVoice-FR-Team"
> export METADATA_MODEL_NAME="cv-fr"
> export METADATA_MODEL_VERSION="1.2"
11,13c11,13
< export METADATA_MIN_DS_VERSION="0.7"
< export METADATA_MAX_DS_VERSION="0.9"
< export METADATA_DESCRIPTION="A free and re-usable French model for DeepSpeech"
---
> export METADATA_MIN_STT_VERSION="1.0.0"
> export METADATA_MAX_STT_VERSION="1.4.0"
> export METADATA_DESCRIPTION="A free and re-usable French model for Speech-to-Text"
diff '--color=auto' --recursive DeepSpeech/fr/params.sh STT/fr/params.sh
7,8c7,8
< export CV_RELEASE_FILENAME="cv-4-fr.tar.gz"
< export CV_RELEASE_SHA256="ffda45f2006fb6092fb435c786cde422e38183f7837e9faa65cb273439cf369e"
---
> export CV_RELEASE_FILENAME="cv-corpus-12.0-2022-12-07-fr.tar.gz"
> export CV_RELEASE_SHA256="00afc519d48d749a4724386dc203b8a0286060efe4ccb46963555794fef216eb"
diff '--color=auto' --recursive DeepSpeech/fr/prepare_lm.sh STT/fr/prepare_lm.sh
23a24,31
> # Use leftovers transcription as indirect natural context for the lm to prepare for testing.
> # You can quickly add new sentences to the scorer by creating a file named `_*_lm.txt`. Where * can be anything.
> # All text files which name start with underscore and end with `_lm.txt` will be normalized and added to the scorer.
> if [ "${LM_ADD_EXCLUDED_MAX_SEC}" = "1" ] && [ ! -f "excluded_max_sec_lm.txt" ]; then
> cat _*_lm.txt | tr '[:upper:]' '[:lower:]' > excluded_max_sec_lm.txt
> EXCLUDED_LM_SOURCE="excluded_max_sec_lm.txt"
> fi;
>
28c36
< cat wiki_fr_lower.txt debats-assemblee-nationale.txt | sed -e 's/<s>/ /g' > sources_lm.txt
---
> cat wiki_fr_lower.txt debats-assemblee-nationale.txt ${EXCLUDED_LM_SOURCE} | sed -e 's/<s>/ /g' > sources_lm.txt
diff '--color=auto' --recursive DeepSpeech/fr/validate_label.py STT/fr/validate_label.py
61a62,91
> '西',
> '甌',
> '牡',
> '文',
> '丹',
> 'も',
> 'む',
> 'ⱅ', #<-- comment me when people stop thinking i'm a m
> 'ⱎ', #<-- comment me when people stop thinking i'm a w
> 'ጀ',
> 'ከ',
> 'ӌ',
> 'є',
> 'э',
> 'ч',
> 'ц',
> 'р̌',
> 'р',
> '◌̌',
> 'п', #<-- comment me when people stop thinking i'm Pi
> 'л',
> 'д',
> 'χ', #<-- comment me if someone can pronounce me correctly /xi/
> 'λ', #<-- comment me when everyone knows how to pronounce |λ|
> 'η', #<-- comment me when people know my name is êta
> 'ɨ' ,#<-- comment me when everyone stop thinking i'm a t
> 'ꝑ', #<-- comment me when people can lookup my name
> 'ɛ',
> 'ə',
> 'ɔ',
70a101,109
> label = label.replace("宇津保", "utsuho")
> label = label.replace("厳", "")
> label = label.replace("三", "")
> label = label.replace("⊨", "inclus")
>
> label = label.replace("ⱅ", "m") #<-- comment me when people stop thinking i'm a m
> label = label.replace("ⱎ", "w") #<-- comment me when people stop thinking i'm a w
> label = label.replace("р", "p") #<-- comment me when people stop thinking i'm a p
>
76a116,117
> label = label.replace("ʽ", " ")
> label = label.replace('’', "'")
102a144,148
> label = label.replace("∼", "~")
> label = label.replace("̐", "")
> label = label.replace("─", "")
> label = label.replace("̲", "")
>
200a247,249
> label = label.replace("ķ", "k")
> label = label.replace("ǀ", "")
>
205a255,256
> label = label.replace("→", "")
> label = label.replace("↔", "")
225a277
> #label = label.replace("₽", "rouble russe") #<-- if you need this currency
diff '--color=auto' --recursive DeepSpeech/generate_alphabet.sh STT/generate_alphabet.sh
5c5
< pushd $HOME/ds/
---
> pushd ${STT_DIR}
14c14
< python training/deepspeech_training/util/check_characters.py \
---
> python -m coqui_stt_training.util.check_characters \
Seulement dans STT: import_cv_perso.sh
diff '--color=auto' --recursive DeepSpeech/import_cv.sh STT/import_cv.sh
10c10
< pushd $HOME/ds/
---
> pushd ${STT_DIR}
diff '--color=auto' --recursive DeepSpeech/import_lingualibre.sh STT/import_lingualibre.sh
5c5
< pushd $HOME/ds/
---
> pushd ${STT_DIR}
diff '--color=auto' --recursive DeepSpeech/import_m-ailabs.sh STT/import_m-ailabs.sh
5c5
< pushd $HOME/ds/
---
> pushd ${STT_DIR}
10a11,15
>
> if [ "${LM_ADD_EXCLUDED_MAX_SEC}" = "1" ]; then
> SAVE_EXCLUDED_MAX_SEC="--save_excluded_max_sec_to_disk /mnt/extracted/data/M-AILABS/M-AILABS_excluded_lm.txt"
> fi;
>
18a24
> ${SAVE_EXCLUDED_MAX_SEC} \
19a26,29
>
> if [ "${LM_ADD_EXCLUDED_MAX_SEC}" = "1" ]; then
> mv /mnt/extracted/data/M-AILABS/M-AILABS_excluded_lm.txt /mnt/extracted/_m-ailabs_lm.txt
> fi;
Seulement dans STT: lm_optimizer.py
diff '--color=auto' --recursive DeepSpeech/package.sh STT/package.sh
7,12d6
< if [ ! -f "model_tensorflow_fr.tar.xz" ]; then
< tar -cf - \
< -C /mnt/models/ output_graph.pbmm alphabet.txt \
< -C /mnt/lm/ kenlm.scorer | xz -T0 > model_tensorflow_fr.tar.xz
< fi;
<
Seulement dans STT: parse_augment_args.sh
diff '--color=auto' --recursive DeepSpeech/README.md STT/README.md
1c1
< # Groupe de travail pour DeepSpeech en français
---
> # Groupe de travail pour la reconaissance vocal du français (CommonVoice-fr)
7,8c7,8
< - [Participer à DeepSpeech](#Participer-à-DeepSpeech)
< - [Processus pour DeepSpeech fr](#Processus-pour-deepSpeech-fr)
---
> - [Participer à CommonVoice-fr](#Participer-à-STT)
> - [Processus pour CommonVoice-fr](#Processus-pour-CommonVoice-fr)
16c16
< - [Utiliser DeepSpeech pour vos projets webs](#Utiliser-DeepSpeech-pour-vos-projets-web)
---
> - [Utiliser STT pour vos projets webs](#Utiliser-STT-pour-vos-projets-web)
24c24,28
< Le projet DeepSpeech est un autre projet de la fondation Mozilla, pour transformer les ondes sonores en texte à partir de l'algorithme d'apprentissage proposé par [Common Voice](https://github.com/Common-Voice/commonvoice-fr/tree/master/CommonVoice).
---
> > STT: Speech-To-Text
>
> > Ou l'art de transcrire la voix en texte.
>
> Le projet CommonVoice FR utilise 🐸-STT ([Coqui-STT](https://github.com/coqui-ai/STT)), l'implémentation suivante du projet [DeepSpeech](https://github.com/mozilla/DeepSpeech) de la fondation Mozilla, pour continuer à transformer les ondes sonores en texte à partir de l'algorithme d'apprentissage proposé par la communauté.
28c32
< - **DeepSpeech** utilise le canal **Common Voice fr** sur [Matrix](https://github.com/mozfr/besogne/wiki/Matrix) pour la discussion et la coordination : [s’inscrire au groupe](https://chat.mozilla.org/#/room/#common-voice-fr:mozilla.org)
---
> - **CommonVoice-fr** utilise le canal **Common Voice FR** sur [Matrix](https://github.com/mozfr/besogne/wiki/Matrix) pour la discussion et la coordination : [s’inscrire au groupe](https://chat.mozilla.org/#/room/#common-voice-fr:mozilla.org)
32c36
< # Participer à DeepSpeech _pour tous_
---
> # Participer à CommonVoice _pour tous_
34c38
< Le projet **DeepSpeech** utilise des jeux de données du projet **Common Voice fr**, vous pouvez aider à faire grandir cette base de données : [Participer à Common Voice](https://github.com/Common-Voice/commonvoice-fr/tree/master/CommonVoice#Participer-à-Common-Voice).
---
> Le projet **CommonVoice-fr** utilise des jeux de données du projet **Common Voice fr**, vous pouvez aider à faire grandir cette base de données : [Participer à Common Voice](https://github.com/Common-Voice/commonvoice-fr/tree/master/CommonVoice#Participer-à-Common-Voice).
36c40
< # Processus pour DeepSpeech fr
---
> # Processus pour CommonVoice-fr
46c50,52
< - Les détails d'installation et de configuration sont disponible à la page de [Contribution](https://github.com/Common-Voice/commonvoice-fr/blob/master/DeepSpeech/CONTRIBUTING.md)
---
> - Les détails d'installation et de configuration sont disponible à la page de [Contribution](https://github.com/Common-Voice/commonvoice-fr/blob/master/STT/CONTRIBUTING.md) (en anglais).
>
> - Pour l'ajustement des modèles francophones sur vos données personnelles de CommonVoice, lisez [cet article sur les forums de Mozilla](https://discourse.mozilla.org/t/entrainer-des-modeles-sur-mesure-avec-commonvoice-fr/97503?u=skeilnet)
54c60
< - [Modèles DeepSpeech](https://github.com/mozilla/deepspeech)
---
> - [Modèles STT](https://coqui.ai/models)
67c73
< ### Utiliser DeepSpeech pour vos projets web
---
> ### Utiliser STT pour vos projets web
69,72c75,78
< - [C#](https://github.com/mozilla/DeepSpeech/tree/master/examples/net_framework)
< - [NodeJS](https://github.com/mozilla/DeepSpeech/tree/master/examples/nodejs_wav)
< - [Streaming NodeJS](https://github.com/mozilla/DeepSpeech/tree/master/examples/ffmpeg_vad_streaming)
< - [transcription (streaming) Python](https://github.com/mozilla/DeepSpeech/tree/master/examples/vad_transcriber)
---
> - [C#](https://github.com/coqui-ai/STT/tree/master/examples/net_framework)
> - [NodeJS](https://github.com/coqui-ai/STT/tree/master/examples/nodejs_wav)
> - [Streaming NodeJS](https://github.com/coqui-ai/STT/tree/master/examples/ffmpeg_vad_streaming)
> - [transcription (streaming) Python](https://github.com/coqui-ai/STT/tree/master/examples/vad_transcriber)
76c82
< - [mycroft](https://mycroft.ai/blog/deepspeech-update/) – assistant vocal open source
---
> - [mycroft](https://mycroft.ai/blog/STT-update/) – assistant vocal open source
78c84
< - [Baidu](https://github.com/mozilla/deepspeech) – implémentation d'une architecture DeepSpeech
---
> - [Coqui-STT](https://github.com/coqui-ai/STT) – implémentation d'une architecture STT
diff '--color=auto' --recursive DeepSpeech/run.sh STT/run.sh
8,9d7
< export TF_CUDNN_RESET_RND_GEN_STATE=1
<
30c28
<
---
>
35a34,48
>
> if [ -f "/mnt/lm/opt_lm.yml" -a "${LM_ALPHA}" = "0.0" -a "${LM_BETA}" = "0.0" ]; then
> export LM_ALPHA=$(cat /mnt/lm/opt_lm.yml | shyaml get-value lm_alpha)
> export LM_BETA=$(cat /mnt/lm/opt_lm.yml | shyaml get-value lm_beta)
>
> if [ -f "/mnt/lm/kenlm.scorer" ]; then
> rm /mnt/lm/kenlm.scorer
> fi;
>
> build_lm.sh
> fi;
>
> test.sh
>
> export.sh
Seulement dans STT: test.sh
diff '--color=auto' --recursive DeepSpeech/train.sh STT/train.sh
5,8c5,8
< pushd $HOME/ds/
< all_train_csv="$(find /mnt/extracted/data/ -type f -name '*train.csv' -printf '%p,' | sed -e 's/,$//g')"
< all_dev_csv="$(find /mnt/extracted/data/ -type f -name '*dev.csv' -printf '%p,' | sed -e 's/,$//g')"
< all_test_csv="$(find /mnt/extracted/data/ -type f -name '*test.csv' -printf '%p,' | sed -e 's/,$//g')"
---
> pushd ${STT_DIR}
> all_train_csv="$(find /mnt/extracted/data/ -type f -name '*train.csv' -printf '%p ' | sed -e 's/ $//g')"
> all_dev_csv="$(find /mnt/extracted/data/ -type f -name '*dev.csv' -printf '%p ' | sed -e 's/ $//g')"
> all_test_csv="$(find /mnt/extracted/data/ -type f -name '*test.csv' -printf '%p ' | sed -e 's/ $//g')"
12c12
< # Do not overwrite checkpoint file if model already exist: we will likely
---
> # Do not overwrite checkpoint file if model already exist: we will likely
14c14
< if [ -f "/transfer-checkpoint/checkpoint" -a ! -f "/mnt/models/output_graph.pb" ]; then
---
> if [ -f "/transfer-checkpoint/checkpoint" -a ! -f "/mnt/models/output_graph.tflite" ]; then
16c16,19
< cp -a /transfer-checkpoint/* /mnt/checkpoints/
---
> # use --load_checkpoint_dir for transfer learning
> LOAD_CHECKPOINT_FROM="--load_checkpoint_dir /transfer-checkpoint --save_checkpoint_dir /mnt/checkpoints"
> else
> LOAD_CHECKPOINT_FROM="--checkpoint_dir /mnt/checkpoints/"
19c22
< EARLY_STOP_FLAG="--early_stop"
---
> EARLY_STOP_FLAG="--early_stop true"
21c24
< EARLY_STOP_FLAG="--noearly_stop"
---
> EARLY_STOP_FLAG="--early_stop false"
26c29
< AMP_FLAG="--automatic_mixed_precision True"
---
> AMP_FLAG="--automatic_mixed_precision true"
29,32c32,34
< # Check metadata existence
< if [ -z "$METADATA_AUTHOR" ]; then
< echo "Please fill-in metadata informations"
< exit 1
---
> SKIP_BATCH_TEST_FLAG=""
> if [ "${SKIP_BATCH_TEST}" = "1" ]; then
> SKIP_BATCH_TEST_FLAG="--skip_batch_test true"
35,43c37,43
< # Ok, assume we have all the metadata now
< ALL_METADATA_FLAGS="--export_author_id $METADATA_AUTHOR"
< ALL_METADATA_FLAGS="$ALL_METADATA_FLAGS --export_model_version $METADATA_MODEL_VERSION"
< ALL_METADATA_FLAGS="$ALL_METADATA_FLAGS --export_contact_info $METADATA_CONTACT_INFO"
< ALL_METADATA_FLAGS="$ALL_METADATA_FLAGS --export_license $METADATA_LICENSE"
< ALL_METADATA_FLAGS="$ALL_METADATA_FLAGS --export_language $METADATA_LANGUAGE"
< ALL_METADATA_FLAGS="$ALL_METADATA_FLAGS --export_min_ds_version $METADATA_MIN_DS_VERSION"
< ALL_METADATA_FLAGS="$ALL_METADATA_FLAGS --export_max_ds_version $METADATA_MAX_DS_VERSION"
< ALL_METADATA_FLAGS="$ALL_METADATA_FLAGS --export_description $METADATA_DESCRIPTION"
---
> # Basic augmentation for data
> # TODO: Add use of overlays with noise datasets
> # ^ This would require to download and prepare noise data
> ALL_AUGMENT_FLAGS=""
> if [ "${ENABLE_AUGMENTS}" = "1" ]; then
> ${$HOMEDIR}/parse_augments_args.sh
> fi;
47,49c47,49
< python -u DeepSpeech.py \
< --show_progressbar True \
< --train_cudnn True \
---
> python -m coqui_stt_training.train \
> --show_progressbar true \
> --train_cudnn true \
57,59c57,59
< --train_batch_size ${BATCH_SIZE} \
< --dev_batch_size ${BATCH_SIZE} \
< --test_batch_size ${BATCH_SIZE} \
---
> --train_batch_size ${TRAIN_BATCH_SIZE} \
> --dev_batch_size ${DEV_BATCH_SIZE} \
> --test_batch_size ${TEST_BATCH_SIZE} \
65a66
> --log_level=${LOG_LEVEL} \
67,141c68,70
< --checkpoint_dir /mnt/checkpoints/
< fi;
<
< if [ ! -f "/mnt/models/test_output.json" ]; then
< python -u DeepSpeech.py \
< --show_progressbar True \
< --train_cudnn True \
< ${AMP_FLAG} \
< --alphabet_config_path /mnt/models/alphabet.txt \
< --scorer_path /mnt/lm/kenlm.scorer \
< --test_files ${all_test_csv} \
< --test_batch_size ${BATCH_SIZE} \
< --n_hidden ${N_HIDDEN} \
< --lm_alpha ${LM_ALPHA} \
< --lm_beta ${LM_BETA} \
< --checkpoint_dir /mnt/checkpoints/ \
< --test_output_file /mnt/models/test_output.json
< fi;
<
< if [ ! -f "/mnt/models/output_graph.pb" ]; then
< METADATA_MODEL_NAME_FLAG="--export_model_name $METADATA_MODEL_NAME-tensorflow"
< python -u DeepSpeech.py \
< --alphabet_config_path /mnt/models/alphabet.txt \
< --scorer_path /mnt/lm/kenlm.scorer \
< --feature_cache /mnt/sources/feature_cache \
< --n_hidden ${N_HIDDEN} \
< --beam_width ${BEAM_WIDTH} \
< --lm_alpha ${LM_ALPHA} \
< --lm_beta ${LM_BETA} \
< --load_evaluate "best" \
< --checkpoint_dir /mnt/checkpoints/ \
< --export_dir /mnt/models/ \
< ${ALL_METADATA_FLAGS} \
< ${METADATA_MODEL_NAME_FLAG}
< fi;
<
< if [ ! -f "/mnt/models/output_graph.tflite" ]; then
< METADATA_MODEL_NAME_FLAG="--export_model_name $METADATA_MODEL_NAME-tflite"
< python -u DeepSpeech.py \
< --alphabet_config_path /mnt/models/alphabet.txt \
< --scorer_path /mnt/lm/kenlm.scorer \
< --feature_cache /mnt/sources/feature_cache \
< --n_hidden ${N_HIDDEN} \
< --beam_width ${BEAM_WIDTH} \
< --lm_alpha ${LM_ALPHA} \
< --lm_beta ${LM_BETA} \
< --load_evaluate "best" \
< --checkpoint_dir /mnt/checkpoints/ \
< --export_dir /mnt/models/ \
< --export_tflite \
< ${ALL_METADATA_FLAGS} \
< ${METADATA_MODEL_NAME_FLAG}
< fi;
<
< if [ ! -f "/mnt/models/${MODEL_EXPORT_ZIP_LANG}.zip" ]; then
< mkdir /mnt/models/${MODEL_EXPORT_ZIP_LANG} || rm /mnt/models/${MODEL_EXPORT_ZIP_LANG}/*
< METADATA_MODEL_NAME_FLAG="--export_model_name $METADATA_MODEL_NAME-tflite"
< python -u DeepSpeech.py \
< --alphabet_config_path /mnt/models/alphabet.txt \
< --scorer_path /mnt/lm/kenlm.scorer \
< --feature_cache /mnt/sources/feature_cache \
< --n_hidden ${N_HIDDEN} \
< --beam_width ${BEAM_WIDTH} \
< --lm_alpha ${LM_ALPHA} \
< --lm_beta ${LM_BETA} \
< --load_evaluate "best" \
< --checkpoint_dir /mnt/checkpoints/ \
< --export_dir /mnt/models/${MODEL_EXPORT_ZIP_LANG} \
< --export_zip \
< ${ALL_METADATA_FLAGS} \
< ${METADATA_MODEL_NAME_FLAG}
< fi;
<
< if [ ! -f "/mnt/models/output_graph.pbmm" ]; then
< ./convert_graphdef_memmapped_format --in_graph=/mnt/models/output_graph.pb --out_graph=/mnt/models/output_graph.pbmm
---
> ${LOAD_CHECKPOINT_FROM} \
> ${SKIP_BATCH_TEST_FLAG} \
> ${ALL_AUGMENT_FLAGS}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment