wasertech/deepspeech-stt.diff Secret

## deepspeech-stt.diff
#❯ diff --recursive DeepSpeech STT
Seulement dans STT: augments.txt
diff '--color=auto' --recursive DeepSpeech/build_lm.sh STT/build_lm.sh
19c19
< pushd $HOME/ds/
---
> pushd ${STT_DIR}
26c26
<                       --kenlm_bins $HOME/kenlm/build/bin/ \
---
>                       --kenlm_bins ${HOMEDIR}/kenlm/build/bin/ \
36c36
<               --alphabet /mnt/models/alphabet.txt \
---
>               --checkpoint /mnt/models/ \
diff '--color=auto' --recursive DeepSpeech/checks.sh STT/checks.sh
33,34c33,34
< pushd $HOME/ds/
<       ./bin/run-tc-ldc93s1_new.sh 2 16000
---
> pushd ${STT_DIR}
>       ./bin/run-ci-ldc93s1_new.sh 2 16000
diff '--color=auto' --recursive DeepSpeech/CONTRIBUTING.md STT/CONTRIBUTING.md
9,11c9,11
< * Ensure you have a running setup of `NVIDIA Docker`
< * Prepare a host directory with enough space for training / producing intermediate data (100GB ?).
< * Ensure it's writable by `trainer` (uid 999) user (defined in the Dockerfile).
---
> * Ensure you have a running setup of [`Docker` working with GPU support](https://docs.docker.com/config/containers/resource_constraints/#gpu)
> * Prepare a host directory with enough space for training / producing intermediate data (>=400GB).
> * Ensure it's writable by `trainer` (uid 999 by default) user (defined in the Dockerfile).
13c13
<   Place `cv-4-fr.tar.gz` inside your host directory, in a `sources/` subdirectory.
---
>   Place `cv-corpus-*-fr` inside your host directory, in a `sources/` subdirectory.
18c18
< $ docker build -f Dockerfile.train .
---
> $ docker build [--build-arg ARG=val] -f Dockerfile.train -t commonvoice-fr .
22,24c22,24
<  - `ds_repo` to fetch DeepSpeech from a different repo than upstream
<  - `ds_branch` to checkout a specific branch / commit
<  - `ds_sha1` commit to pull from when installing pre-built binaries
---
>  - `stt_repo` to fetch STT from a different repo than upstream
>  - `stt_branch` to checkout a specific branch / commit
>  - `stt_sha1` commit to pull from when installing pre-built binaries
30,32c30,33
<  - lm_evaluate_range, if non empty, this will perform a LM alpha/beta evaluation
<     the parameter is expected to be of the form: lm_alpha_max,lm_beta_max,n_trials.
<     See upstream lm_optimizer.py for details
---
>  - `lm_evaluate_range`, if non empty, this will perform a LM alpha/beta evaluation
>     the parameter is expected to be of the form: `lm_alpha_max`,`lm_beta_max`,`n_trials`.
>     See upstream `lm_optimizer.py` for details
>  - `lm_add_excluded_max_sec` set to 1 adds excluded sentences that were too long to the language model.
35c36,38
<  - `batch_size` to specify the batch size for training, dev and test dataset
---
>  - `train_batch_size` to specify the batch size for training dataset
>  - `dev_batch_size` to specify the batch size for dev dataset
>  - `test_batch_size` to specify the batch size for test dataset
40a44
>  - `skip_batch_test` to skip or not batch test completely
43a48,52
>  - `enable_augments` to help the model to better generalise on noisy data by augmenting the data in various ways.
>  - `augmentation_arguments` to set `augments_file` path to give augemntation parameters.
>  - `augments.txt`: `augments_file` containing arguments to use for data argumentation if `enable_augments` is set to 1.
>  - `cv_personal_first_url` to download only your own voice instead of all Common Voice dataset (first url).
>  - `cv_personal_second_url` to download only your own voice instead of all Common Voice dataset (second url).
53a63,65
> Miscellaneous parameters:
>  - `use_tf_random_gen_state`: set to 0 if your GPU doesn't need to use Tensorflow's CuDNN random state generation to train.
>
64c76
<  - Common Voice French, released on june 2020
---
>  - Common Voice French, released on april 2022 (v9.0)
67a80
>  - OpenSLR 94: Att-HACK
68a82
>  - MLS French dataset
70c84
< ### Transfer learning from English
---
> ### Transfer learning from pre-trained checkpoints
77,78c91,93
< `type=bind,src=PATH/TO/CHECKPOINTS,dst=/transfer-checkpoint`. Upon running, data
< will be copied from that place.
---
> `type=bind,src=PATH/TO/CHECKPOINTS,dst=/transfer-checkpoint`. Upon running, the checkpoints will be automatically used as starting point.
>
> Checkpoints don't typically use automatic mixed precision nor fully-connected layer normalization and mostly use a standard number of hidden layers (2048 unless specified otherwise). So don't change those parameters to fine-tune from them.
83,86c98,108
<  - Threadripper 3950X + 128GB RAM
<  - 2x RTX 2080 Ti
<  - Debian Sid, kernel 5.7, driver 440.100
<  - With ~1000h of audio, one training epoch takes ~23min (Automatic Mixed Precision enabled)
---
>
> > - Threadripper 3950X + 128GB RAM
> > - 2x RTX 2080 Ti
> > - Debian Sid, kernel 5.7, driver 440.100
>
> > - Threadripper 2920X + 96GB RAM
> > - 2x Titan RTX
> > - Manjaro (Arch) Linux, kernel 5.15.32-1-MANJARO, driver 510.60.02
>
>
> With ~1000h of audio, one training epoch takes ~23min (Automatic Mixed Precision enabled)
94c116
< $ docker run --tty --runtime=nvidia --mount type=bind,src=PATH/TO/HOST/DIRECTORY,dst=/mnt <docker-image-id>
---
> $ docker run --it --gpus=all --mount type=bind,src=PATH/TO/HOST/DIRECTORY,dst=/mnt --env TRAIN_BATCH_SIZE=64 commonvoice-fr
diff '--color=auto' --recursive DeepSpeech/Dockerfile.train STT/Dockerfile.train
1c1
< FROM nvidia/cuda:10.0-cudnn7-runtime-ubuntu18.04
---
> FROM nvcr.io/nvidia/tensorflow:22.02-tf1-py3
3,5c3,5
< ARG ds_repo=mozilla/DeepSpeech
< ARG ds_branch=4270e22fe02f4fa7430a721ac917f6353c36f455
< ARG ds_sha1=4270e22fe02f4fa7430a721ac917f6353c36f455
---
> ARG stt_repo=coqui-ai/STT
> ARG stt_branch=fcec06bdd89f6ae68e2599495e8471da5e5ba45e
> ARG stt_sha1=fcec06bdd89f6ae68e2599495e8471da5e5ba45e
16,17c16,23
< ARG batch_size=64
< ENV BATCH_SIZE=$batch_size
---
> ARG train_batch_size=64
> ENV TRAIN_BATCH_SIZE=$train_batch_size
>
> ARG dev_batch_size=64
> ENV DEV_BATCH_SIZE=$dev_batch_size
>
> ARG test_batch_size=64
> ENV TEST_BATCH_SIZE=$test_batch_size
22c28
< ARG epochs=30
---
> ARG epochs=40
48a55,60
> # Skipping batch test to avoid hanging processes
> # Should be set to 0 by default once STT#2195 is fixed
> # See https://github.com/coqui-ai/STT/issues/2195 for more details
> ARG skip_batch_test=1
> ENV SKIP_BATCH_TEST=$skip_batch_test
>
56a69,75
> # Data augmentation
> ARG enable_augments=0
> ENV ENABLE_AUGMENTS=$enable_augments
>
> ARG augmentation_arguments="augments.txt"
> ENV AUGMENTATION_ARGUMENTS=$augmentation_arguments
>
60a80,92
> ARG lm_add_excluded_max_sec=0
> ENV LM_ADD_EXCLUDED_MAX_SEC=$lm_add_excluded_max_sec
>
> # To fine-tune using your own data
> ARG cv_personal_first_url=
> ENV CV_PERSONAL_FIRST_URL=$cv_personal_first_url
>
> ARG cv_personal_second_url=
> ENV CV_PERSONAL_SECOND_URL=$cv_personal_second_url
>
> ARG log_level=1
> ENV LOG_LEVEL=$log_level
>
66a99,104
> # Configure random state
> # Required for trainig on newer GPUs such as series 30/40.
> # You can safely disable it (set to 0) if your GPU doesn't need it.
> ARG use_tf_random_gen_state=1
> ENV TF_CUDNN_RESET_RND_GEN_STATE=$use_tf_random_gen_state
>
75c113
< ENV VIRTUAL_ENV_NAME ds-train
---
> ENV VIRTUAL_ENV_NAME stt-train
77c115
< ENV DS_DIR $HOMEDIR/ds
---
> ENV STT_DIR $HOMEDIR/stt
80,81c118,119
< ENV DS_BRANCH=$ds_branch
< ENV DS_SHA1=$ds_sha1
---
> ENV STT_BRANCH=$stt_branch
> ENV STT_SHA1=$stt_sha1
83c121
< ENV PATH="$VIRTUAL_ENV/bin:$PATH"
---
> ENV PATH="$VIRTUAL_ENV/bin:${HOMEDIR}/tf-venv/bin:$PATH"
93d130
<     ffmpeg \
101a139,143
>     libmagic-dev \
>     libopus0 \
>     libopusfile0 \
>     libsndfile1 \
>     libeigen3-dev \
104c146
<     virtualenv \
---
>     python3-venv \
109a152
>     ffmpeg \
111c154,162
<     xz-utils
---
>     xz-utils \
>     software-properties-common
>
> # For exporting using TFLite
> RUN add-apt-repository ppa:deadsnakes/ppa -y
>
> RUN apt-get -qq update && apt-get -qq install -y --no-install-recommends \
>     python3.7 \
>     python3.7-venv
124,126c175
< RUN wget -O - https://gitlab.com/libeigen/eigen/-/archive/3.2.8/eigen-3.2.8.tar.bz2 | tar xj
<
< RUN git clone https://github.com/$kenlm_repo.git && cd kenlm && git checkout $kenlm_branch \
---
> RUN git clone https://github.com/$kenlm_repo.git ${HOMEDIR}/kenlm && cd ${HOMEDIR}/kenlm && git checkout $kenlm_branch \
129c178
<     && EIGEN3_ROOT=$HOMEDIR/eigen-eigen-07105f7124f9 cmake .. \
---
>     && cmake .. \
134c183,186
< RUN virtualenv --python=/usr/bin/python3 $VIRTUAL_ENV_NAME
---
> RUN python3 -m venv --system-site-packages $VIRTUAL_ENV_NAME
>
> # Venv for upstream tensorflow with tflite api
> RUN python3.7 -m venv ${HOME}/tf-venv
138c190,194
< RUN git clone https://github.com/$ds_repo.git $DS_DIR
---
> RUN git clone https://github.com/$stt_repo.git $STT_DIR
>
> WORKDIR $STT_DIR
>
> RUN git checkout $stt_branch
140c196
< WORKDIR $DS_DIR
---
> WORKDIR $STT_DIR
142c198
< RUN git checkout $ds_branch
---
> RUN pip install --upgrade pip wheel setuptools
144c200,202
< WORKDIR $DS_DIR
---
> # Build CTC decoder first, to avoid clashes on incompatible versions upgrades
> RUN cd native_client/ctcdecode && make NUM_PROCESSES=$(nproc) bindings
> RUN pip install --upgrade native_client/ctcdecode/dist/*.whl
146,148c204,208
< RUN pip install --upgrade pip==20.0.2 wheel==0.34.2 setuptools==46.1.3
< RUN DS_NOTENSORFLOW=y pip install --upgrade --force-reinstall -e .
< RUN pip install --upgrade tensorflow-gpu==1.15.4
---
> # Install STT
> # No need for the decoder since we did it earlier
> # TensorFlow GPU should already be installed on the base image,
> # and we don't want to break that
> RUN DS_NODECODER=y DS_NOTENSORFLOW=y pip install --upgrade --force-reinstall -e .
150,153c210,211
< RUN TASKCLUSTER_SCHEME="https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.tensorflow.pip.%(branch_name)s.%(arch_string)s/artifacts/public/%(artifact_name)s" python util/taskcluster.py \
<       --target="$(pwd)" \
<       --artifact="convert_graphdef_memmapped_format" \
<       --branch="r1.15" && chmod +x convert_graphdef_memmapped_format
---
> # Install coqui_stt_training (inside tf-venv) for exporting models using tflite
> RUN ${HOME}/tf-venv/bin/pip install -e .
155,157c213,215
< RUN python util/taskcluster.py \
<       --target="$(pwd)" \
<       --artifact="native_client.tar.xz" && ls -hal generate_scorer_package
---
> # Pre-built native client tools
> RUN LATEST_STABLE_RELEASE=$(curl "https://api.github.com/repos/coqui-ai/STT/releases/latest" | python -c 'import sys; import json; print(json.load(sys.stdin)["tag_name"])') \
>  bash -c 'curl -L https://github.com/coqui-ai/STT/releases/download/${LATEST_STABLE_RELEASE}/native_client.tflite.Linux.tar.xz | tar -xJvf -' && ls -hal generate_scorer_package
175c233,234
< RUN pip install parso==0.8.1
---
> # modin has this wierd strict but implicit dependency: swifter<1.1.0
> RUN pip install parso==0.8.3 'swifter<1.1.0'
182c241,247
< RUN pip install num2words
---
> RUN pip install num2words zipfile38
>
> # Fix numpy and pandas version
> RUN python -m pip install 'numpy<1.19.0,>=1.16.0' 'pandas<1.4.0dev0,>=1.0'
>
> # Use yaml in bash to get best lm alpha and beta from opt for export
> RUN python -m pip install shyaml
186c251
< ENV PATH="$HOMEDIR/kenlm/build/bin/:$PATH"
---
> ENV PATH="${HOMEDIR}/kenlm/build/bin/:$PATH"
diff '--color=auto' --recursive DeepSpeech/evaluate_lm.sh STT/evaluate_lm.sh
5,6c5,6
< pushd $HOME/ds/
<       all_test_csv="$(find /mnt/extracted/data/ -type f -name '*test.csv' -printf '%p,' | sed -e 's/,$//g')"
---
> pushd ${STT_DIR}
>       all_test_csv="$(find /mnt/extracted/data/ -type f -name '*test.csv' -printf '%p ' | sed -e 's/ $//g')"
13c13
<       if [ ! -z "${LM_EVALUATE_RANGE}" ]; then
---
>       if [ ! -z "${LM_EVALUATE_RANGE}" -a ! -f '/mnt/lm/opt_lm.yml' ]; then
18,20c18,20
<               python -u lm_optimizer.py \
<                       --show_progressbar True \
<                       --train_cudnn True \
---
>               python -u ${HOME}/lm_optimizer.py \
>                       --show_progressbar true \
>                       --train_cudnn true \
25c25
<                       --test_batch_size ${BATCH_SIZE} \
---
>                       --test_batch_size ${TEST_BATCH_SIZE} \
Seulement dans STT: export.sh
Seulement dans STT/fr: import_atthack.sh
diff '--color=auto' --recursive DeepSpeech/fr/import_ccpmf.sh STT/fr/import_ccpmf.sh
5c5
< pushd $HOME/ds/
---
> pushd ${STT_DIR}
6a7,11
>
>               if [ "${LM_ADD_EXCLUDED_MAX_SEC}" = "1" ]; then
>           SAVE_EXCLUDED_MAX_SEC="--save_excluded_max_sec_to_disk /mnt/extracted/data/ccpmf/ccpmf_excluded_lm.txt"
>       fi;
>
11a17
>                       ${SAVE_EXCLUDED_MAX_SEC} \
12a19,22
>
>               if [ "${LM_ADD_EXCLUDED_MAX_SEC}" = "1" ]; then
>               mv /mnt/extracted/data/ccpmf/ccpmf_excluded_lm.txt /mnt/extracted/_ccpmf_lm.txt
>       fi;
diff '--color=auto' --recursive DeepSpeech/fr/importers.sh STT/fr/importers.sh
5c5,17
< ../import_cv.sh
---
> # If the environment contains urls to download a CV personal archive of the user
> # and there is a checkpoint mounted but no output_graph,
> # it's likely we want to downlaod our personal archive as data
> # and start fine-tuning from our checkpoint.
> if [ \
>     -f "/transfer-checkpoint/checkpoint" -a \
>     ! -f "/mnt/models/output_graph.tflite" -a \
>     ! -z "${CV_PERSONAL_FIRST_URL}" -a \
>     ! -z "${CV_PERSONAL_SECOND_URL}" \
> ]; then
>     ../import_cv_perso.sh
> else
>     ../import_cv.sh
7c19
< ../import_lingualibre.sh
---
>     ../import_lingualibre.sh
9c21
< import_trainingspeech.sh
---
>     import_trainingspeech.sh
11c23
< import_slr57.sh
---
>     import_slr57.sh
13c25
< ../import_m-ailabs.sh
---
>     ../import_m-ailabs.sh
15c27,32
< import_ccpmf.sh
---
>     ./import_atthack.sh
>
>     ./import_mls.sh
>
>     #./import_ccpmf.sh
> fi;
Seulement dans STT/fr: import_mls.sh
diff '--color=auto' --recursive DeepSpeech/fr/import_slr57.sh STT/fr/import_slr57.sh
5c5
< pushd $HOME/ds/
---
> pushd ${STT_DIR}
10a11,15
>
>               if [ "${LM_ADD_EXCLUDED_MAX_SEC}" = "1" ]; then
>           SAVE_EXCLUDED_MAX_SEC="--save_excluded_max_sec_to_disk /mnt/extracted/data/African_Accented_French/African_Accented_French_excluded_lm.txt"
>           fi;
>
13a19
>                       ${SAVE_EXCLUDED_MAX_SEC} \
14a21,24
>
>               if [ "${LM_ADD_EXCLUDED_MAX_SEC}" = "1" ]; then
>               mv /mnt/extracted/data/African_Accented_French/African_Accented_French_excluded_lm.txt /mnt/extracted/_slr57_lm.txt
>       fi;
diff '--color=auto' --recursive DeepSpeech/fr/import_trainingspeech.sh STT/fr/import_trainingspeech.sh
5,6c5
< pushd $HOME/ds/
<       pip install Unidecode==1.0.23
---
> pushd ${STT_DIR}
diff '--color=auto' --recursive DeepSpeech/fr/metadata.sh STT/fr/metadata.sh
5,7c5,7
< export METADATA_AUTHOR="DeepSpeech-FR-Team"
< export METADATA_MODEL_NAME="deepspeech-fr"
< export METADATA_MODEL_VERSION="0.6"
---
> export METADATA_AUTHOR="CommonVoice-FR-Team"
> export METADATA_MODEL_NAME="cv-fr"
> export METADATA_MODEL_VERSION="1.2"
11,13c11,13
< export METADATA_MIN_DS_VERSION="0.7"
< export METADATA_MAX_DS_VERSION="0.9"
< export METADATA_DESCRIPTION="A free and re-usable French model for DeepSpeech"
---
> export METADATA_MIN_STT_VERSION="1.0.0"
> export METADATA_MAX_STT_VERSION="1.4.0"
> export METADATA_DESCRIPTION="A free and re-usable French model for Speech-to-Text"
diff '--color=auto' --recursive DeepSpeech/fr/params.sh STT/fr/params.sh
7,8c7,8
< export CV_RELEASE_FILENAME="cv-4-fr.tar.gz"
< export CV_RELEASE_SHA256="ffda45f2006fb6092fb435c786cde422e38183f7837e9faa65cb273439cf369e"
---
> export CV_RELEASE_FILENAME="cv-corpus-12.0-2022-12-07-fr.tar.gz"
> export CV_RELEASE_SHA256="00afc519d48d749a4724386dc203b8a0286060efe4ccb46963555794fef216eb"
diff '--color=auto' --recursive DeepSpeech/fr/prepare_lm.sh STT/fr/prepare_lm.sh
23a24,31
> # Use leftovers transcription as indirect natural context for the lm to prepare for testing.
> # You can quickly add new sentences to the scorer by creating a file named `_*_lm.txt`. Where * can be anything.
> # All text files which name start with underscore and end with `_lm.txt` will be normalized and added to the scorer.
> if [ "${LM_ADD_EXCLUDED_MAX_SEC}" = "1" ] && [ ! -f "excluded_max_sec_lm.txt" ]; then
>       cat _*_lm.txt | tr '[:upper:]' '[:lower:]' > excluded_max_sec_lm.txt
>       EXCLUDED_LM_SOURCE="excluded_max_sec_lm.txt"
> fi;
>
28c36
<       cat wiki_fr_lower.txt debats-assemblee-nationale.txt | sed -e 's/<s>/ /g' > sources_lm.txt
---
>       cat wiki_fr_lower.txt debats-assemblee-nationale.txt ${EXCLUDED_LM_SOURCE} | sed -e 's/<s>/ /g' > sources_lm.txt
diff '--color=auto' --recursive DeepSpeech/fr/validate_label.py STT/fr/validate_label.py
61a62,91
>         '西',
>         '甌',
>         '牡',
>         '文',
>         '丹',
>         'も',
>         'む',
>         'ⱅ', #<-- comment me when people stop thinking i'm a m
>         'ⱎ', #<-- comment me when people stop thinking i'm a w
>         'ጀ',
>         'ከ',
>         'ӌ',
>         'є',
>         'э',
>         'ч',
>         'ц',
>         'р̌',
>         'р',
>         '◌̌',
>         'п', #<-- comment me when people stop thinking i'm Pi
>         'л',
>         'д',
>         'χ', #<-- comment me if someone can pronounce me correctly /xi/
>         'λ', #<-- comment me when everyone knows how to pronounce |λ|
>         'η', #<-- comment me when people know my name is êta
>         'ɨ' ,#<-- comment me when everyone stop thinking i'm a t
>         'ꝑ', #<-- comment me when people can lookup my name
>         'ɛ',
>         'ə',
>         'ɔ',
70a101,109
>     label = label.replace("宇津保", "utsuho")
>     label = label.replace("厳", "")
>     label = label.replace("三", "")
>     label = label.replace("⊨", "inclus")
>
>     label = label.replace("ⱅ", "m") #<-- comment me when people stop thinking i'm a m
>     label = label.replace("ⱎ", "w") #<-- comment me when people stop thinking i'm a w
>     label = label.replace("р", "p") #<-- comment me when people stop thinking i'm a p
>
76a116,117
>     label = label.replace("ʽ", " ")
>     label = label.replace('’', "'")
102a144,148
>     label = label.replace("∼", "~")
>     label = label.replace("̐", "")
>     label = label.replace("─", "")
>     label = label.replace("̲", "")
>
200a247,249
>     label = label.replace("ķ", "k")
>     label = label.replace("ǀ", "")
>
205a255,256
>     label = label.replace("→", "")
>     label = label.replace("↔", "")
225a277
>     #label = label.replace("₽", "rouble russe") #<-- if you need this currency
diff '--color=auto' --recursive DeepSpeech/generate_alphabet.sh STT/generate_alphabet.sh
5c5
< pushd $HOME/ds/
---
> pushd ${STT_DIR}
14c14
<                       python training/deepspeech_training/util/check_characters.py \
---
>                       python -m coqui_stt_training.util.check_characters \
Seulement dans STT: import_cv_perso.sh
diff '--color=auto' --recursive DeepSpeech/import_cv.sh STT/import_cv.sh
10c10
< pushd $HOME/ds/
---
> pushd ${STT_DIR}
diff '--color=auto' --recursive DeepSpeech/import_lingualibre.sh STT/import_lingualibre.sh
5c5
< pushd $HOME/ds/
---
> pushd ${STT_DIR}
diff '--color=auto' --recursive DeepSpeech/import_m-ailabs.sh STT/import_m-ailabs.sh
5c5
< pushd $HOME/ds/
---
> pushd ${STT_DIR}
10a11,15
>
>               if [ "${LM_ADD_EXCLUDED_MAX_SEC}" = "1" ]; then
>               SAVE_EXCLUDED_MAX_SEC="--save_excluded_max_sec_to_disk /mnt/extracted/data/M-AILABS/M-AILABS_excluded_lm.txt"
>       fi;
>
18a24
>                       ${SAVE_EXCLUDED_MAX_SEC} \
19a26,29
>
>               if [ "${LM_ADD_EXCLUDED_MAX_SEC}" = "1" ]; then
>           mv /mnt/extracted/data/M-AILABS/M-AILABS_excluded_lm.txt /mnt/extracted/_m-ailabs_lm.txt
>           fi;
Seulement dans STT: lm_optimizer.py
diff '--color=auto' --recursive DeepSpeech/package.sh STT/package.sh
7,12d6
<       if [ ! -f "model_tensorflow_fr.tar.xz" ]; then
<               tar -cf - \
<                       -C /mnt/models/ output_graph.pbmm alphabet.txt \
<                       -C /mnt/lm/ kenlm.scorer | xz -T0 > model_tensorflow_fr.tar.xz
<       fi;
<
Seulement dans STT: parse_augment_args.sh
diff '--color=auto' --recursive DeepSpeech/README.md STT/README.md
1c1
< # Groupe de travail pour DeepSpeech en français
---
> # Groupe de travail pour la reconaissance vocal du français (CommonVoice-fr)
7,8c7,8
< - [Participer à DeepSpeech](#Participer-à-DeepSpeech)
< - [Processus pour DeepSpeech fr](#Processus-pour-deepSpeech-fr)
---
> - [Participer à CommonVoice-fr](#Participer-à-STT)
> - [Processus pour CommonVoice-fr](#Processus-pour-CommonVoice-fr)
16c16
<   - [Utiliser DeepSpeech pour vos projets webs](#Utiliser-DeepSpeech-pour-vos-projets-web)
---
>   - [Utiliser STT pour vos projets webs](#Utiliser-STT-pour-vos-projets-web)
24c24,28
< Le projet DeepSpeech est un autre projet de la fondation Mozilla, pour transformer les ondes sonores en texte à partir de l'algorithme d'apprentissage proposé par [Common Voice](https://github.com/Common-Voice/commonvoice-fr/tree/master/CommonVoice).
---
> > STT: Speech-To-Text
>
> > Ou l'art de transcrire la voix en texte.
>
> Le projet CommonVoice FR utilise 🐸-STT ([Coqui-STT](https://github.com/coqui-ai/STT)), l'implémentation suivante du projet [DeepSpeech](https://github.com/mozilla/DeepSpeech) de la fondation Mozilla, pour continuer à transformer les ondes sonores en texte à partir de l'algorithme d'apprentissage proposé par la communauté.
28c32
< - **DeepSpeech** utilise le canal **Common Voice fr** sur [Matrix](https://github.com/mozfr/besogne/wiki/Matrix) pour la discussion et la coordination : [s’inscrire au groupe](https://chat.mozilla.org/#/room/#common-voice-fr:mozilla.org)
---
> - **CommonVoice-fr** utilise le canal **Common Voice FR** sur [Matrix](https://github.com/mozfr/besogne/wiki/Matrix) pour la discussion et la coordination : [s’inscrire au groupe](https://chat.mozilla.org/#/room/#common-voice-fr:mozilla.org)
32c36
< # Participer à DeepSpeech _pour tous_
---
> # Participer à CommonVoice _pour tous_
34c38
< Le projet **DeepSpeech** utilise des jeux de données du projet **Common Voice fr**, vous pouvez aider à faire grandir cette base de données : [Participer à Common Voice](https://github.com/Common-Voice/commonvoice-fr/tree/master/CommonVoice#Participer-à-Common-Voice).
---
> Le projet **CommonVoice-fr** utilise des jeux de données du projet **Common Voice fr**, vous pouvez aider à faire grandir cette base de données : [Participer à Common Voice](https://github.com/Common-Voice/commonvoice-fr/tree/master/CommonVoice#Participer-à-Common-Voice).
36c40
< # Processus pour DeepSpeech fr
---
> # Processus pour CommonVoice-fr
46c50,52
< - Les détails d'installation et de configuration sont disponible à la page de [Contribution](https://github.com/Common-Voice/commonvoice-fr/blob/master/DeepSpeech/CONTRIBUTING.md)
---
> - Les détails d'installation et de configuration sont disponible à la page de [Contribution](https://github.com/Common-Voice/commonvoice-fr/blob/master/STT/CONTRIBUTING.md) (en anglais).
>
> - Pour l'ajustement des modèles francophones sur vos données personnelles de CommonVoice, lisez [cet article sur les forums de Mozilla](https://discourse.mozilla.org/t/entrainer-des-modeles-sur-mesure-avec-commonvoice-fr/97503?u=skeilnet)
54c60
< - [Modèles DeepSpeech](https://github.com/mozilla/deepspeech)
---
> - [Modèles STT](https://coqui.ai/models)
67c73
< ### Utiliser DeepSpeech pour vos projets web
---
> ### Utiliser STT pour vos projets web
69,72c75,78
< - [C#](https://github.com/mozilla/DeepSpeech/tree/master/examples/net_framework)
< - [NodeJS](https://github.com/mozilla/DeepSpeech/tree/master/examples/nodejs_wav)
< - [Streaming NodeJS](https://github.com/mozilla/DeepSpeech/tree/master/examples/ffmpeg_vad_streaming)
< - [transcription (streaming) Python](https://github.com/mozilla/DeepSpeech/tree/master/examples/vad_transcriber)
---
> - [C#](https://github.com/coqui-ai/STT/tree/master/examples/net_framework)
> - [NodeJS](https://github.com/coqui-ai/STT/tree/master/examples/nodejs_wav)
> - [Streaming NodeJS](https://github.com/coqui-ai/STT/tree/master/examples/ffmpeg_vad_streaming)
> - [transcription (streaming) Python](https://github.com/coqui-ai/STT/tree/master/examples/vad_transcriber)
76c82
< - [mycroft](https://mycroft.ai/blog/deepspeech-update/) – assistant vocal open source
---
> - [mycroft](https://mycroft.ai/blog/STT-update/) – assistant vocal open source
78c84
< - [Baidu](https://github.com/mozilla/deepspeech) – implémentation d'une architecture DeepSpeech
---
> - [Coqui-STT](https://github.com/coqui-ai/STT) – implémentation d'une architecture STT
diff '--color=auto' --recursive DeepSpeech/run.sh STT/run.sh
8,9d7
< export TF_CUDNN_RESET_RND_GEN_STATE=1
<
30c28
<
---
>
35a34,48
>
> if [ -f "/mnt/lm/opt_lm.yml" -a "${LM_ALPHA}" = "0.0" -a "${LM_BETA}" = "0.0" ]; then
>       export LM_ALPHA=$(cat /mnt/lm/opt_lm.yml | shyaml get-value lm_alpha)
>       export LM_BETA=$(cat /mnt/lm/opt_lm.yml | shyaml get-value lm_beta)
>
>       if [ -f "/mnt/lm/kenlm.scorer" ]; then
>           rm /mnt/lm/kenlm.scorer
>     fi;
>
>       build_lm.sh
> fi;
>
> test.sh
>
> export.sh
Seulement dans STT: test.sh
diff '--color=auto' --recursive DeepSpeech/train.sh STT/train.sh
5,8c5,8
< pushd $HOME/ds/
<       all_train_csv="$(find /mnt/extracted/data/ -type f -name '*train.csv' -printf '%p,' | sed -e 's/,$//g')"
<       all_dev_csv="$(find /mnt/extracted/data/ -type f -name '*dev.csv' -printf '%p,' | sed -e 's/,$//g')"
<       all_test_csv="$(find /mnt/extracted/data/ -type f -name '*test.csv' -printf '%p,' | sed -e 's/,$//g')"
---
> pushd ${STT_DIR}
>       all_train_csv="$(find /mnt/extracted/data/ -type f -name '*train.csv' -printf '%p ' | sed -e 's/ $//g')"
>       all_dev_csv="$(find /mnt/extracted/data/ -type f -name '*dev.csv' -printf '%p ' | sed -e 's/ $//g')"
>       all_test_csv="$(find /mnt/extracted/data/ -type f -name '*test.csv' -printf '%p ' | sed -e 's/ $//g')"
12c12
<         # Do not overwrite checkpoint file if model already exist: we will likely
---
>       # Do not overwrite checkpoint file if model already exist: we will likely
14c14
<       if [ -f "/transfer-checkpoint/checkpoint" -a ! -f "/mnt/models/output_graph.pb" ]; then
---
>       if [ -f "/transfer-checkpoint/checkpoint" -a ! -f "/mnt/models/output_graph.tflite" ]; then
16c16,19
<               cp -a /transfer-checkpoint/* /mnt/checkpoints/
---
>               # use --load_checkpoint_dir for transfer learning
>               LOAD_CHECKPOINT_FROM="--load_checkpoint_dir /transfer-checkpoint --save_checkpoint_dir /mnt/checkpoints"
>       else
>               LOAD_CHECKPOINT_FROM="--checkpoint_dir /mnt/checkpoints/"
19c22
<       EARLY_STOP_FLAG="--early_stop"
---
>       EARLY_STOP_FLAG="--early_stop true"
21c24
<               EARLY_STOP_FLAG="--noearly_stop"
---
>               EARLY_STOP_FLAG="--early_stop false"
26c29
<               AMP_FLAG="--automatic_mixed_precision True"
---
>               AMP_FLAG="--automatic_mixed_precision true"
29,32c32,34
<       # Check metadata existence
<       if [ -z "$METADATA_AUTHOR" ]; then
<               echo "Please fill-in metadata informations"
<               exit 1
---
>       SKIP_BATCH_TEST_FLAG=""
>       if [ "${SKIP_BATCH_TEST}" = "1" ]; then
>               SKIP_BATCH_TEST_FLAG="--skip_batch_test true"
35,43c37,43
<       # Ok, assume we have all the metadata now
<       ALL_METADATA_FLAGS="--export_author_id $METADATA_AUTHOR"
<       ALL_METADATA_FLAGS="$ALL_METADATA_FLAGS --export_model_version $METADATA_MODEL_VERSION"
<       ALL_METADATA_FLAGS="$ALL_METADATA_FLAGS --export_contact_info $METADATA_CONTACT_INFO"
<       ALL_METADATA_FLAGS="$ALL_METADATA_FLAGS --export_license $METADATA_LICENSE"
<       ALL_METADATA_FLAGS="$ALL_METADATA_FLAGS --export_language $METADATA_LANGUAGE"
<       ALL_METADATA_FLAGS="$ALL_METADATA_FLAGS --export_min_ds_version $METADATA_MIN_DS_VERSION"
<       ALL_METADATA_FLAGS="$ALL_METADATA_FLAGS --export_max_ds_version $METADATA_MAX_DS_VERSION"
<       ALL_METADATA_FLAGS="$ALL_METADATA_FLAGS --export_description $METADATA_DESCRIPTION"
---
>       # Basic augmentation for data
>       # TODO: Add use of overlays with noise datasets
>       # ^ This would require to download and prepare noise data
>       ALL_AUGMENT_FLAGS=""
>       if [ "${ENABLE_AUGMENTS}" = "1" ]; then
>               ${$HOMEDIR}/parse_augments_args.sh
>       fi;
47,49c47,49
<               python -u DeepSpeech.py \
<                       --show_progressbar True \
<                       --train_cudnn True \
---
>               python -m coqui_stt_training.train \
>                       --show_progressbar true \
>                       --train_cudnn true \
57,59c57,59
<                       --train_batch_size ${BATCH_SIZE} \
<                       --dev_batch_size ${BATCH_SIZE} \
<                       --test_batch_size ${BATCH_SIZE} \
---
>                       --train_batch_size ${TRAIN_BATCH_SIZE} \
>                       --dev_batch_size ${DEV_BATCH_SIZE} \
>                       --test_batch_size ${TEST_BATCH_SIZE} \
65a66
>                       --log_level=${LOG_LEVEL} \
67,141c68,70
<                       --checkpoint_dir /mnt/checkpoints/
<       fi;
<
<       if [ ! -f "/mnt/models/test_output.json" ]; then
<               python -u DeepSpeech.py \
<                       --show_progressbar True \
<                       --train_cudnn True \
<                       ${AMP_FLAG} \
<                       --alphabet_config_path /mnt/models/alphabet.txt \
<                       --scorer_path /mnt/lm/kenlm.scorer \
<                       --test_files ${all_test_csv} \
<                       --test_batch_size ${BATCH_SIZE} \
<                       --n_hidden ${N_HIDDEN} \
<                       --lm_alpha ${LM_ALPHA} \
<                       --lm_beta ${LM_BETA} \
<                       --checkpoint_dir /mnt/checkpoints/ \
<                       --test_output_file /mnt/models/test_output.json
<       fi;
<
<       if [ ! -f "/mnt/models/output_graph.pb" ]; then
<               METADATA_MODEL_NAME_FLAG="--export_model_name $METADATA_MODEL_NAME-tensorflow"
<               python -u DeepSpeech.py \
<                       --alphabet_config_path /mnt/models/alphabet.txt \
<                       --scorer_path /mnt/lm/kenlm.scorer \
<                       --feature_cache /mnt/sources/feature_cache \
<                       --n_hidden ${N_HIDDEN} \
<                       --beam_width ${BEAM_WIDTH} \
<                       --lm_alpha ${LM_ALPHA} \
<                       --lm_beta ${LM_BETA} \
<                       --load_evaluate "best" \
<                       --checkpoint_dir /mnt/checkpoints/ \
<                       --export_dir /mnt/models/ \
<                       ${ALL_METADATA_FLAGS} \
<                       ${METADATA_MODEL_NAME_FLAG}
<       fi;
<
<       if [ ! -f "/mnt/models/output_graph.tflite" ]; then
<               METADATA_MODEL_NAME_FLAG="--export_model_name $METADATA_MODEL_NAME-tflite"
<               python -u DeepSpeech.py \
<                       --alphabet_config_path /mnt/models/alphabet.txt \
<                       --scorer_path /mnt/lm/kenlm.scorer \
<                       --feature_cache /mnt/sources/feature_cache \
<                       --n_hidden ${N_HIDDEN} \
<                       --beam_width ${BEAM_WIDTH} \
<                       --lm_alpha ${LM_ALPHA} \
<                       --lm_beta ${LM_BETA} \
<                       --load_evaluate "best" \
<                       --checkpoint_dir /mnt/checkpoints/ \
<                       --export_dir /mnt/models/ \
<                       --export_tflite \
<                       ${ALL_METADATA_FLAGS} \
<                       ${METADATA_MODEL_NAME_FLAG}
<       fi;
<
<       if [ ! -f "/mnt/models/${MODEL_EXPORT_ZIP_LANG}.zip" ]; then
<               mkdir /mnt/models/${MODEL_EXPORT_ZIP_LANG} || rm /mnt/models/${MODEL_EXPORT_ZIP_LANG}/*
<               METADATA_MODEL_NAME_FLAG="--export_model_name $METADATA_MODEL_NAME-tflite"
<               python -u DeepSpeech.py \
<                       --alphabet_config_path /mnt/models/alphabet.txt \
<                       --scorer_path /mnt/lm/kenlm.scorer \
<                       --feature_cache /mnt/sources/feature_cache \
<                       --n_hidden ${N_HIDDEN} \
<                       --beam_width ${BEAM_WIDTH} \
<                       --lm_alpha ${LM_ALPHA} \
<                       --lm_beta ${LM_BETA} \
<                       --load_evaluate "best" \
<                       --checkpoint_dir /mnt/checkpoints/ \
<                       --export_dir /mnt/models/${MODEL_EXPORT_ZIP_LANG} \
<                       --export_zip \
<                       ${ALL_METADATA_FLAGS} \
<                       ${METADATA_MODEL_NAME_FLAG}
<       fi;
<
<       if [ ! -f "/mnt/models/output_graph.pbmm" ]; then
<               ./convert_graphdef_memmapped_format --in_graph=/mnt/models/output_graph.pb --out_graph=/mnt/models/output_graph.pbmm
---
>                       ${LOAD_CHECKPOINT_FROM} \
>                       ${SKIP_BATCH_TEST_FLAG} \
>                       ${ALL_AUGMENT_FLAGS}