-
-
Save wasertech/2228796276fb3d2911cbbe55dac1e23d to your computer and use it in GitHub Desktop.
diff --recursive DeepSpeech STT
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#❯ diff --recursive DeepSpeech STT | |
Seulement dans STT: augments.txt | |
diff '--color=auto' --recursive DeepSpeech/build_lm.sh STT/build_lm.sh | |
19c19 | |
< pushd $HOME/ds/ | |
--- | |
> pushd ${STT_DIR} | |
26c26 | |
< --kenlm_bins $HOME/kenlm/build/bin/ \ | |
--- | |
> --kenlm_bins ${HOMEDIR}/kenlm/build/bin/ \ | |
36c36 | |
< --alphabet /mnt/models/alphabet.txt \ | |
--- | |
> --checkpoint /mnt/models/ \ | |
diff '--color=auto' --recursive DeepSpeech/checks.sh STT/checks.sh | |
33,34c33,34 | |
< pushd $HOME/ds/ | |
< ./bin/run-tc-ldc93s1_new.sh 2 16000 | |
--- | |
> pushd ${STT_DIR} | |
> ./bin/run-ci-ldc93s1_new.sh 2 16000 | |
diff '--color=auto' --recursive DeepSpeech/CONTRIBUTING.md STT/CONTRIBUTING.md | |
9,11c9,11 | |
< * Ensure you have a running setup of `NVIDIA Docker` | |
< * Prepare a host directory with enough space for training / producing intermediate data (100GB ?). | |
< * Ensure it's writable by `trainer` (uid 999) user (defined in the Dockerfile). | |
--- | |
> * Ensure you have a running setup of [`Docker` working with GPU support](https://docs.docker.com/config/containers/resource_constraints/#gpu) | |
> * Prepare a host directory with enough space for training / producing intermediate data (>=400GB). | |
> * Ensure it's writable by `trainer` (uid 999 by default) user (defined in the Dockerfile). | |
13c13 | |
< Place `cv-4-fr.tar.gz` inside your host directory, in a `sources/` subdirectory. | |
--- | |
> Place `cv-corpus-*-fr` inside your host directory, in a `sources/` subdirectory. | |
18c18 | |
< $ docker build -f Dockerfile.train . | |
--- | |
> $ docker build [--build-arg ARG=val] -f Dockerfile.train -t commonvoice-fr . | |
22,24c22,24 | |
< - `ds_repo` to fetch DeepSpeech from a different repo than upstream | |
< - `ds_branch` to checkout a specific branch / commit | |
< - `ds_sha1` commit to pull from when installing pre-built binaries | |
--- | |
> - `stt_repo` to fetch STT from a different repo than upstream | |
> - `stt_branch` to checkout a specific branch / commit | |
> - `stt_sha1` commit to pull from when installing pre-built binaries | |
30,32c30,33 | |
< - lm_evaluate_range, if non empty, this will perform a LM alpha/beta evaluation | |
< the parameter is expected to be of the form: lm_alpha_max,lm_beta_max,n_trials. | |
< See upstream lm_optimizer.py for details | |
--- | |
> - `lm_evaluate_range`, if non empty, this will perform a LM alpha/beta evaluation | |
> the parameter is expected to be of the form: `lm_alpha_max`,`lm_beta_max`,`n_trials`. | |
> See upstream `lm_optimizer.py` for details | |
> - `lm_add_excluded_max_sec` set to 1 adds excluded sentences that were too long to the language model. | |
35c36,38 | |
< - `batch_size` to specify the batch size for training, dev and test dataset | |
--- | |
> - `train_batch_size` to specify the batch size for training dataset | |
> - `dev_batch_size` to specify the batch size for dev dataset | |
> - `test_batch_size` to specify the batch size for test dataset | |
40a44 | |
> - `skip_batch_test` to skip or not batch test completely | |
43a48,52 | |
> - `enable_augments` to help the model to better generalise on noisy data by augmenting the data in various ways. | |
> - `augmentation_arguments` to set `augments_file` path to give augemntation parameters. | |
> - `augments.txt`: `augments_file` containing arguments to use for data argumentation if `enable_augments` is set to 1. | |
> - `cv_personal_first_url` to download only your own voice instead of all Common Voice dataset (first url). | |
> - `cv_personal_second_url` to download only your own voice instead of all Common Voice dataset (second url). | |
53a63,65 | |
> Miscellaneous parameters: | |
> - `use_tf_random_gen_state`: set to 0 if your GPU doesn't need to use Tensorflow's CuDNN random state generation to train. | |
> | |
64c76 | |
< - Common Voice French, released on june 2020 | |
--- | |
> - Common Voice French, released on april 2022 (v9.0) | |
67a80 | |
> - OpenSLR 94: Att-HACK | |
68a82 | |
> - MLS French dataset | |
70c84 | |
< ### Transfer learning from English | |
--- | |
> ### Transfer learning from pre-trained checkpoints | |
77,78c91,93 | |
< `type=bind,src=PATH/TO/CHECKPOINTS,dst=/transfer-checkpoint`. Upon running, data | |
< will be copied from that place. | |
--- | |
> `type=bind,src=PATH/TO/CHECKPOINTS,dst=/transfer-checkpoint`. Upon running, the checkpoints will be automatically used as starting point. | |
> | |
> Checkpoints don't typically use automatic mixed precision nor fully-connected layer normalization and mostly use a standard number of hidden layers (2048 unless specified otherwise). So don't change those parameters to fine-tune from them. | |
83,86c98,108 | |
< - Threadripper 3950X + 128GB RAM | |
< - 2x RTX 2080 Ti | |
< - Debian Sid, kernel 5.7, driver 440.100 | |
< - With ~1000h of audio, one training epoch takes ~23min (Automatic Mixed Precision enabled) | |
--- | |
> | |
> > - Threadripper 3950X + 128GB RAM | |
> > - 2x RTX 2080 Ti | |
> > - Debian Sid, kernel 5.7, driver 440.100 | |
> | |
> > - Threadripper 2920X + 96GB RAM | |
> > - 2x Titan RTX | |
> > - Manjaro (Arch) Linux, kernel 5.15.32-1-MANJARO, driver 510.60.02 | |
> | |
> | |
> With ~1000h of audio, one training epoch takes ~23min (Automatic Mixed Precision enabled) | |
94c116 | |
< $ docker run --tty --runtime=nvidia --mount type=bind,src=PATH/TO/HOST/DIRECTORY,dst=/mnt <docker-image-id> | |
--- | |
> $ docker run --it --gpus=all --mount type=bind,src=PATH/TO/HOST/DIRECTORY,dst=/mnt --env TRAIN_BATCH_SIZE=64 commonvoice-fr | |
diff '--color=auto' --recursive DeepSpeech/Dockerfile.train STT/Dockerfile.train | |
1c1 | |
< FROM nvidia/cuda:10.0-cudnn7-runtime-ubuntu18.04 | |
--- | |
> FROM nvcr.io/nvidia/tensorflow:22.02-tf1-py3 | |
3,5c3,5 | |
< ARG ds_repo=mozilla/DeepSpeech | |
< ARG ds_branch=4270e22fe02f4fa7430a721ac917f6353c36f455 | |
< ARG ds_sha1=4270e22fe02f4fa7430a721ac917f6353c36f455 | |
--- | |
> ARG stt_repo=coqui-ai/STT | |
> ARG stt_branch=fcec06bdd89f6ae68e2599495e8471da5e5ba45e | |
> ARG stt_sha1=fcec06bdd89f6ae68e2599495e8471da5e5ba45e | |
16,17c16,23 | |
< ARG batch_size=64 | |
< ENV BATCH_SIZE=$batch_size | |
--- | |
> ARG train_batch_size=64 | |
> ENV TRAIN_BATCH_SIZE=$train_batch_size | |
> | |
> ARG dev_batch_size=64 | |
> ENV DEV_BATCH_SIZE=$dev_batch_size | |
> | |
> ARG test_batch_size=64 | |
> ENV TEST_BATCH_SIZE=$test_batch_size | |
22c28 | |
< ARG epochs=30 | |
--- | |
> ARG epochs=40 | |
48a55,60 | |
> # Skipping batch test to avoid hanging processes | |
> # Should be set to 0 by default once STT#2195 is fixed | |
> # See https://github.com/coqui-ai/STT/issues/2195 for more details | |
> ARG skip_batch_test=1 | |
> ENV SKIP_BATCH_TEST=$skip_batch_test | |
> | |
56a69,75 | |
> # Data augmentation | |
> ARG enable_augments=0 | |
> ENV ENABLE_AUGMENTS=$enable_augments | |
> | |
> ARG augmentation_arguments="augments.txt" | |
> ENV AUGMENTATION_ARGUMENTS=$augmentation_arguments | |
> | |
60a80,92 | |
> ARG lm_add_excluded_max_sec=0 | |
> ENV LM_ADD_EXCLUDED_MAX_SEC=$lm_add_excluded_max_sec | |
> | |
> # To fine-tune using your own data | |
> ARG cv_personal_first_url= | |
> ENV CV_PERSONAL_FIRST_URL=$cv_personal_first_url | |
> | |
> ARG cv_personal_second_url= | |
> ENV CV_PERSONAL_SECOND_URL=$cv_personal_second_url | |
> | |
> ARG log_level=1 | |
> ENV LOG_LEVEL=$log_level | |
> | |
66a99,104 | |
> # Configure random state | |
> # Required for trainig on newer GPUs such as series 30/40. | |
> # You can safely disable it (set to 0) if your GPU doesn't need it. | |
> ARG use_tf_random_gen_state=1 | |
> ENV TF_CUDNN_RESET_RND_GEN_STATE=$use_tf_random_gen_state | |
> | |
75c113 | |
< ENV VIRTUAL_ENV_NAME ds-train | |
--- | |
> ENV VIRTUAL_ENV_NAME stt-train | |
77c115 | |
< ENV DS_DIR $HOMEDIR/ds | |
--- | |
> ENV STT_DIR $HOMEDIR/stt | |
80,81c118,119 | |
< ENV DS_BRANCH=$ds_branch | |
< ENV DS_SHA1=$ds_sha1 | |
--- | |
> ENV STT_BRANCH=$stt_branch | |
> ENV STT_SHA1=$stt_sha1 | |
83c121 | |
< ENV PATH="$VIRTUAL_ENV/bin:$PATH" | |
--- | |
> ENV PATH="$VIRTUAL_ENV/bin:${HOMEDIR}/tf-venv/bin:$PATH" | |
93d130 | |
< ffmpeg \ | |
101a139,143 | |
> libmagic-dev \ | |
> libopus0 \ | |
> libopusfile0 \ | |
> libsndfile1 \ | |
> libeigen3-dev \ | |
104c146 | |
< virtualenv \ | |
--- | |
> python3-venv \ | |
109a152 | |
> ffmpeg \ | |
111c154,162 | |
< xz-utils | |
--- | |
> xz-utils \ | |
> software-properties-common | |
> | |
> # For exporting using TFLite | |
> RUN add-apt-repository ppa:deadsnakes/ppa -y | |
> | |
> RUN apt-get -qq update && apt-get -qq install -y --no-install-recommends \ | |
> python3.7 \ | |
> python3.7-venv | |
124,126c175 | |
< RUN wget -O - https://gitlab.com/libeigen/eigen/-/archive/3.2.8/eigen-3.2.8.tar.bz2 | tar xj | |
< | |
< RUN git clone https://github.com/$kenlm_repo.git && cd kenlm && git checkout $kenlm_branch \ | |
--- | |
> RUN git clone https://github.com/$kenlm_repo.git ${HOMEDIR}/kenlm && cd ${HOMEDIR}/kenlm && git checkout $kenlm_branch \ | |
129c178 | |
< && EIGEN3_ROOT=$HOMEDIR/eigen-eigen-07105f7124f9 cmake .. \ | |
--- | |
> && cmake .. \ | |
134c183,186 | |
< RUN virtualenv --python=/usr/bin/python3 $VIRTUAL_ENV_NAME | |
--- | |
> RUN python3 -m venv --system-site-packages $VIRTUAL_ENV_NAME | |
> | |
> # Venv for upstream tensorflow with tflite api | |
> RUN python3.7 -m venv ${HOME}/tf-venv | |
138c190,194 | |
< RUN git clone https://github.com/$ds_repo.git $DS_DIR | |
--- | |
> RUN git clone https://github.com/$stt_repo.git $STT_DIR | |
> | |
> WORKDIR $STT_DIR | |
> | |
> RUN git checkout $stt_branch | |
140c196 | |
< WORKDIR $DS_DIR | |
--- | |
> WORKDIR $STT_DIR | |
142c198 | |
< RUN git checkout $ds_branch | |
--- | |
> RUN pip install --upgrade pip wheel setuptools | |
144c200,202 | |
< WORKDIR $DS_DIR | |
--- | |
> # Build CTC decoder first, to avoid clashes on incompatible versions upgrades | |
> RUN cd native_client/ctcdecode && make NUM_PROCESSES=$(nproc) bindings | |
> RUN pip install --upgrade native_client/ctcdecode/dist/*.whl | |
146,148c204,208 | |
< RUN pip install --upgrade pip==20.0.2 wheel==0.34.2 setuptools==46.1.3 | |
< RUN DS_NOTENSORFLOW=y pip install --upgrade --force-reinstall -e . | |
< RUN pip install --upgrade tensorflow-gpu==1.15.4 | |
--- | |
> # Install STT | |
> # No need for the decoder since we did it earlier | |
> # TensorFlow GPU should already be installed on the base image, | |
> # and we don't want to break that | |
> RUN DS_NODECODER=y DS_NOTENSORFLOW=y pip install --upgrade --force-reinstall -e . | |
150,153c210,211 | |
< RUN TASKCLUSTER_SCHEME="https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.tensorflow.pip.%(branch_name)s.%(arch_string)s/artifacts/public/%(artifact_name)s" python util/taskcluster.py \ | |
< --target="$(pwd)" \ | |
< --artifact="convert_graphdef_memmapped_format" \ | |
< --branch="r1.15" && chmod +x convert_graphdef_memmapped_format | |
--- | |
> # Install coqui_stt_training (inside tf-venv) for exporting models using tflite | |
> RUN ${HOME}/tf-venv/bin/pip install -e . | |
155,157c213,215 | |
< RUN python util/taskcluster.py \ | |
< --target="$(pwd)" \ | |
< --artifact="native_client.tar.xz" && ls -hal generate_scorer_package | |
--- | |
> # Pre-built native client tools | |
> RUN LATEST_STABLE_RELEASE=$(curl "https://api.github.com/repos/coqui-ai/STT/releases/latest" | python -c 'import sys; import json; print(json.load(sys.stdin)["tag_name"])') \ | |
> bash -c 'curl -L https://github.com/coqui-ai/STT/releases/download/${LATEST_STABLE_RELEASE}/native_client.tflite.Linux.tar.xz | tar -xJvf -' && ls -hal generate_scorer_package | |
175c233,234 | |
< RUN pip install parso==0.8.1 | |
--- | |
> # modin has this wierd strict but implicit dependency: swifter<1.1.0 | |
> RUN pip install parso==0.8.3 'swifter<1.1.0' | |
182c241,247 | |
< RUN pip install num2words | |
--- | |
> RUN pip install num2words zipfile38 | |
> | |
> # Fix numpy and pandas version | |
> RUN python -m pip install 'numpy<1.19.0,>=1.16.0' 'pandas<1.4.0dev0,>=1.0' | |
> | |
> # Use yaml in bash to get best lm alpha and beta from opt for export | |
> RUN python -m pip install shyaml | |
186c251 | |
< ENV PATH="$HOMEDIR/kenlm/build/bin/:$PATH" | |
--- | |
> ENV PATH="${HOMEDIR}/kenlm/build/bin/:$PATH" | |
diff '--color=auto' --recursive DeepSpeech/evaluate_lm.sh STT/evaluate_lm.sh | |
5,6c5,6 | |
< pushd $HOME/ds/ | |
< all_test_csv="$(find /mnt/extracted/data/ -type f -name '*test.csv' -printf '%p,' | sed -e 's/,$//g')" | |
--- | |
> pushd ${STT_DIR} | |
> all_test_csv="$(find /mnt/extracted/data/ -type f -name '*test.csv' -printf '%p ' | sed -e 's/ $//g')" | |
13c13 | |
< if [ ! -z "${LM_EVALUATE_RANGE}" ]; then | |
--- | |
> if [ ! -z "${LM_EVALUATE_RANGE}" -a ! -f '/mnt/lm/opt_lm.yml' ]; then | |
18,20c18,20 | |
< python -u lm_optimizer.py \ | |
< --show_progressbar True \ | |
< --train_cudnn True \ | |
--- | |
> python -u ${HOME}/lm_optimizer.py \ | |
> --show_progressbar true \ | |
> --train_cudnn true \ | |
25c25 | |
< --test_batch_size ${BATCH_SIZE} \ | |
--- | |
> --test_batch_size ${TEST_BATCH_SIZE} \ | |
Seulement dans STT: export.sh | |
Seulement dans STT/fr: import_atthack.sh | |
diff '--color=auto' --recursive DeepSpeech/fr/import_ccpmf.sh STT/fr/import_ccpmf.sh | |
5c5 | |
< pushd $HOME/ds/ | |
--- | |
> pushd ${STT_DIR} | |
6a7,11 | |
> | |
> if [ "${LM_ADD_EXCLUDED_MAX_SEC}" = "1" ]; then | |
> SAVE_EXCLUDED_MAX_SEC="--save_excluded_max_sec_to_disk /mnt/extracted/data/ccpmf/ccpmf_excluded_lm.txt" | |
> fi; | |
> | |
11a17 | |
> ${SAVE_EXCLUDED_MAX_SEC} \ | |
12a19,22 | |
> | |
> if [ "${LM_ADD_EXCLUDED_MAX_SEC}" = "1" ]; then | |
> mv /mnt/extracted/data/ccpmf/ccpmf_excluded_lm.txt /mnt/extracted/_ccpmf_lm.txt | |
> fi; | |
diff '--color=auto' --recursive DeepSpeech/fr/importers.sh STT/fr/importers.sh | |
5c5,17 | |
< ../import_cv.sh | |
--- | |
> # If the environment contains urls to download a CV personal archive of the user | |
> # and there is a checkpoint mounted but no output_graph, | |
> # it's likely we want to downlaod our personal archive as data | |
> # and start fine-tuning from our checkpoint. | |
> if [ \ | |
> -f "/transfer-checkpoint/checkpoint" -a \ | |
> ! -f "/mnt/models/output_graph.tflite" -a \ | |
> ! -z "${CV_PERSONAL_FIRST_URL}" -a \ | |
> ! -z "${CV_PERSONAL_SECOND_URL}" \ | |
> ]; then | |
> ../import_cv_perso.sh | |
> else | |
> ../import_cv.sh | |
7c19 | |
< ../import_lingualibre.sh | |
--- | |
> ../import_lingualibre.sh | |
9c21 | |
< import_trainingspeech.sh | |
--- | |
> import_trainingspeech.sh | |
11c23 | |
< import_slr57.sh | |
--- | |
> import_slr57.sh | |
13c25 | |
< ../import_m-ailabs.sh | |
--- | |
> ../import_m-ailabs.sh | |
15c27,32 | |
< import_ccpmf.sh | |
--- | |
> ./import_atthack.sh | |
> | |
> ./import_mls.sh | |
> | |
> #./import_ccpmf.sh | |
> fi; | |
Seulement dans STT/fr: import_mls.sh | |
diff '--color=auto' --recursive DeepSpeech/fr/import_slr57.sh STT/fr/import_slr57.sh | |
5c5 | |
< pushd $HOME/ds/ | |
--- | |
> pushd ${STT_DIR} | |
10a11,15 | |
> | |
> if [ "${LM_ADD_EXCLUDED_MAX_SEC}" = "1" ]; then | |
> SAVE_EXCLUDED_MAX_SEC="--save_excluded_max_sec_to_disk /mnt/extracted/data/African_Accented_French/African_Accented_French_excluded_lm.txt" | |
> fi; | |
> | |
13a19 | |
> ${SAVE_EXCLUDED_MAX_SEC} \ | |
14a21,24 | |
> | |
> if [ "${LM_ADD_EXCLUDED_MAX_SEC}" = "1" ]; then | |
> mv /mnt/extracted/data/African_Accented_French/African_Accented_French_excluded_lm.txt /mnt/extracted/_slr57_lm.txt | |
> fi; | |
diff '--color=auto' --recursive DeepSpeech/fr/import_trainingspeech.sh STT/fr/import_trainingspeech.sh | |
5,6c5 | |
< pushd $HOME/ds/ | |
< pip install Unidecode==1.0.23 | |
--- | |
> pushd ${STT_DIR} | |
diff '--color=auto' --recursive DeepSpeech/fr/metadata.sh STT/fr/metadata.sh | |
5,7c5,7 | |
< export METADATA_AUTHOR="DeepSpeech-FR-Team" | |
< export METADATA_MODEL_NAME="deepspeech-fr" | |
< export METADATA_MODEL_VERSION="0.6" | |
--- | |
> export METADATA_AUTHOR="CommonVoice-FR-Team" | |
> export METADATA_MODEL_NAME="cv-fr" | |
> export METADATA_MODEL_VERSION="1.2" | |
11,13c11,13 | |
< export METADATA_MIN_DS_VERSION="0.7" | |
< export METADATA_MAX_DS_VERSION="0.9" | |
< export METADATA_DESCRIPTION="A free and re-usable French model for DeepSpeech" | |
--- | |
> export METADATA_MIN_STT_VERSION="1.0.0" | |
> export METADATA_MAX_STT_VERSION="1.4.0" | |
> export METADATA_DESCRIPTION="A free and re-usable French model for Speech-to-Text" | |
diff '--color=auto' --recursive DeepSpeech/fr/params.sh STT/fr/params.sh | |
7,8c7,8 | |
< export CV_RELEASE_FILENAME="cv-4-fr.tar.gz" | |
< export CV_RELEASE_SHA256="ffda45f2006fb6092fb435c786cde422e38183f7837e9faa65cb273439cf369e" | |
--- | |
> export CV_RELEASE_FILENAME="cv-corpus-12.0-2022-12-07-fr.tar.gz" | |
> export CV_RELEASE_SHA256="00afc519d48d749a4724386dc203b8a0286060efe4ccb46963555794fef216eb" | |
diff '--color=auto' --recursive DeepSpeech/fr/prepare_lm.sh STT/fr/prepare_lm.sh | |
23a24,31 | |
> # Use leftovers transcription as indirect natural context for the lm to prepare for testing. | |
> # You can quickly add new sentences to the scorer by creating a file named `_*_lm.txt`. Where * can be anything. | |
> # All text files which name start with underscore and end with `_lm.txt` will be normalized and added to the scorer. | |
> if [ "${LM_ADD_EXCLUDED_MAX_SEC}" = "1" ] && [ ! -f "excluded_max_sec_lm.txt" ]; then | |
> cat _*_lm.txt | tr '[:upper:]' '[:lower:]' > excluded_max_sec_lm.txt | |
> EXCLUDED_LM_SOURCE="excluded_max_sec_lm.txt" | |
> fi; | |
> | |
28c36 | |
< cat wiki_fr_lower.txt debats-assemblee-nationale.txt | sed -e 's/<s>/ /g' > sources_lm.txt | |
--- | |
> cat wiki_fr_lower.txt debats-assemblee-nationale.txt ${EXCLUDED_LM_SOURCE} | sed -e 's/<s>/ /g' > sources_lm.txt | |
diff '--color=auto' --recursive DeepSpeech/fr/validate_label.py STT/fr/validate_label.py | |
61a62,91 | |
> '西', | |
> '甌', | |
> '牡', | |
> '文', | |
> '丹', | |
> 'も', | |
> 'む', | |
> 'ⱅ', #<-- comment me when people stop thinking i'm a m | |
> 'ⱎ', #<-- comment me when people stop thinking i'm a w | |
> 'ጀ', | |
> 'ከ', | |
> 'ӌ', | |
> 'є', | |
> 'э', | |
> 'ч', | |
> 'ц', | |
> 'р̌', | |
> 'р', | |
> '◌̌', | |
> 'п', #<-- comment me when people stop thinking i'm Pi | |
> 'л', | |
> 'д', | |
> 'χ', #<-- comment me if someone can pronounce me correctly /xi/ | |
> 'λ', #<-- comment me when everyone knows how to pronounce |λ| | |
> 'η', #<-- comment me when people know my name is êta | |
> 'ɨ' ,#<-- comment me when everyone stop thinking i'm a t | |
> 'ꝑ', #<-- comment me when people can lookup my name | |
> 'ɛ', | |
> 'ə', | |
> 'ɔ', | |
70a101,109 | |
> label = label.replace("宇津保", "utsuho") | |
> label = label.replace("厳", "") | |
> label = label.replace("三", "") | |
> label = label.replace("⊨", "inclus") | |
> | |
> label = label.replace("ⱅ", "m") #<-- comment me when people stop thinking i'm a m | |
> label = label.replace("ⱎ", "w") #<-- comment me when people stop thinking i'm a w | |
> label = label.replace("р", "p") #<-- comment me when people stop thinking i'm a p | |
> | |
76a116,117 | |
> label = label.replace("ʽ", " ") | |
> label = label.replace('’', "'") | |
102a144,148 | |
> label = label.replace("∼", "~") | |
> label = label.replace("̐", "") | |
> label = label.replace("─", "") | |
> label = label.replace("̲", "") | |
> | |
200a247,249 | |
> label = label.replace("ķ", "k") | |
> label = label.replace("ǀ", "") | |
> | |
205a255,256 | |
> label = label.replace("→", "") | |
> label = label.replace("↔", "") | |
225a277 | |
> #label = label.replace("₽", "rouble russe") #<-- if you need this currency | |
diff '--color=auto' --recursive DeepSpeech/generate_alphabet.sh STT/generate_alphabet.sh | |
5c5 | |
< pushd $HOME/ds/ | |
--- | |
> pushd ${STT_DIR} | |
14c14 | |
< python training/deepspeech_training/util/check_characters.py \ | |
--- | |
> python -m coqui_stt_training.util.check_characters \ | |
Seulement dans STT: import_cv_perso.sh | |
diff '--color=auto' --recursive DeepSpeech/import_cv.sh STT/import_cv.sh | |
10c10 | |
< pushd $HOME/ds/ | |
--- | |
> pushd ${STT_DIR} | |
diff '--color=auto' --recursive DeepSpeech/import_lingualibre.sh STT/import_lingualibre.sh | |
5c5 | |
< pushd $HOME/ds/ | |
--- | |
> pushd ${STT_DIR} | |
diff '--color=auto' --recursive DeepSpeech/import_m-ailabs.sh STT/import_m-ailabs.sh | |
5c5 | |
< pushd $HOME/ds/ | |
--- | |
> pushd ${STT_DIR} | |
10a11,15 | |
> | |
> if [ "${LM_ADD_EXCLUDED_MAX_SEC}" = "1" ]; then | |
> SAVE_EXCLUDED_MAX_SEC="--save_excluded_max_sec_to_disk /mnt/extracted/data/M-AILABS/M-AILABS_excluded_lm.txt" | |
> fi; | |
> | |
18a24 | |
> ${SAVE_EXCLUDED_MAX_SEC} \ | |
19a26,29 | |
> | |
> if [ "${LM_ADD_EXCLUDED_MAX_SEC}" = "1" ]; then | |
> mv /mnt/extracted/data/M-AILABS/M-AILABS_excluded_lm.txt /mnt/extracted/_m-ailabs_lm.txt | |
> fi; | |
Seulement dans STT: lm_optimizer.py | |
diff '--color=auto' --recursive DeepSpeech/package.sh STT/package.sh | |
7,12d6 | |
< if [ ! -f "model_tensorflow_fr.tar.xz" ]; then | |
< tar -cf - \ | |
< -C /mnt/models/ output_graph.pbmm alphabet.txt \ | |
< -C /mnt/lm/ kenlm.scorer | xz -T0 > model_tensorflow_fr.tar.xz | |
< fi; | |
< | |
Seulement dans STT: parse_augment_args.sh | |
diff '--color=auto' --recursive DeepSpeech/README.md STT/README.md | |
1c1 | |
< # Groupe de travail pour DeepSpeech en français | |
--- | |
> # Groupe de travail pour la reconaissance vocal du français (CommonVoice-fr) | |
7,8c7,8 | |
< - [Participer à DeepSpeech](#Participer-à-DeepSpeech) | |
< - [Processus pour DeepSpeech fr](#Processus-pour-deepSpeech-fr) | |
--- | |
> - [Participer à CommonVoice-fr](#Participer-à-STT) | |
> - [Processus pour CommonVoice-fr](#Processus-pour-CommonVoice-fr) | |
16c16 | |
< - [Utiliser DeepSpeech pour vos projets webs](#Utiliser-DeepSpeech-pour-vos-projets-web) | |
--- | |
> - [Utiliser STT pour vos projets webs](#Utiliser-STT-pour-vos-projets-web) | |
24c24,28 | |
< Le projet DeepSpeech est un autre projet de la fondation Mozilla, pour transformer les ondes sonores en texte à partir de l'algorithme d'apprentissage proposé par [Common Voice](https://github.com/Common-Voice/commonvoice-fr/tree/master/CommonVoice). | |
--- | |
> > STT: Speech-To-Text | |
> | |
> > Ou l'art de transcrire la voix en texte. | |
> | |
> Le projet CommonVoice FR utilise 🐸-STT ([Coqui-STT](https://github.com/coqui-ai/STT)), l'implémentation suivante du projet [DeepSpeech](https://github.com/mozilla/DeepSpeech) de la fondation Mozilla, pour continuer à transformer les ondes sonores en texte à partir de l'algorithme d'apprentissage proposé par la communauté. | |
28c32 | |
< - **DeepSpeech** utilise le canal **Common Voice fr** sur [Matrix](https://github.com/mozfr/besogne/wiki/Matrix) pour la discussion et la coordination : [s’inscrire au groupe](https://chat.mozilla.org/#/room/#common-voice-fr:mozilla.org) | |
--- | |
> - **CommonVoice-fr** utilise le canal **Common Voice FR** sur [Matrix](https://github.com/mozfr/besogne/wiki/Matrix) pour la discussion et la coordination : [s’inscrire au groupe](https://chat.mozilla.org/#/room/#common-voice-fr:mozilla.org) | |
32c36 | |
< # Participer à DeepSpeech _pour tous_ | |
--- | |
> # Participer à CommonVoice _pour tous_ | |
34c38 | |
< Le projet **DeepSpeech** utilise des jeux de données du projet **Common Voice fr**, vous pouvez aider à faire grandir cette base de données : [Participer à Common Voice](https://github.com/Common-Voice/commonvoice-fr/tree/master/CommonVoice#Participer-à-Common-Voice). | |
--- | |
> Le projet **CommonVoice-fr** utilise des jeux de données du projet **Common Voice fr**, vous pouvez aider à faire grandir cette base de données : [Participer à Common Voice](https://github.com/Common-Voice/commonvoice-fr/tree/master/CommonVoice#Participer-à-Common-Voice). | |
36c40 | |
< # Processus pour DeepSpeech fr | |
--- | |
> # Processus pour CommonVoice-fr | |
46c50,52 | |
< - Les détails d'installation et de configuration sont disponible à la page de [Contribution](https://github.com/Common-Voice/commonvoice-fr/blob/master/DeepSpeech/CONTRIBUTING.md) | |
--- | |
> - Les détails d'installation et de configuration sont disponible à la page de [Contribution](https://github.com/Common-Voice/commonvoice-fr/blob/master/STT/CONTRIBUTING.md) (en anglais). | |
> | |
> - Pour l'ajustement des modèles francophones sur vos données personnelles de CommonVoice, lisez [cet article sur les forums de Mozilla](https://discourse.mozilla.org/t/entrainer-des-modeles-sur-mesure-avec-commonvoice-fr/97503?u=skeilnet) | |
54c60 | |
< - [Modèles DeepSpeech](https://github.com/mozilla/deepspeech) | |
--- | |
> - [Modèles STT](https://coqui.ai/models) | |
67c73 | |
< ### Utiliser DeepSpeech pour vos projets web | |
--- | |
> ### Utiliser STT pour vos projets web | |
69,72c75,78 | |
< - [C#](https://github.com/mozilla/DeepSpeech/tree/master/examples/net_framework) | |
< - [NodeJS](https://github.com/mozilla/DeepSpeech/tree/master/examples/nodejs_wav) | |
< - [Streaming NodeJS](https://github.com/mozilla/DeepSpeech/tree/master/examples/ffmpeg_vad_streaming) | |
< - [transcription (streaming) Python](https://github.com/mozilla/DeepSpeech/tree/master/examples/vad_transcriber) | |
--- | |
> - [C#](https://github.com/coqui-ai/STT/tree/master/examples/net_framework) | |
> - [NodeJS](https://github.com/coqui-ai/STT/tree/master/examples/nodejs_wav) | |
> - [Streaming NodeJS](https://github.com/coqui-ai/STT/tree/master/examples/ffmpeg_vad_streaming) | |
> - [transcription (streaming) Python](https://github.com/coqui-ai/STT/tree/master/examples/vad_transcriber) | |
76c82 | |
< - [mycroft](https://mycroft.ai/blog/deepspeech-update/) – assistant vocal open source | |
--- | |
> - [mycroft](https://mycroft.ai/blog/STT-update/) – assistant vocal open source | |
78c84 | |
< - [Baidu](https://github.com/mozilla/deepspeech) – implémentation d'une architecture DeepSpeech | |
--- | |
> - [Coqui-STT](https://github.com/coqui-ai/STT) – implémentation d'une architecture STT | |
diff '--color=auto' --recursive DeepSpeech/run.sh STT/run.sh | |
8,9d7 | |
< export TF_CUDNN_RESET_RND_GEN_STATE=1 | |
< | |
30c28 | |
< | |
--- | |
> | |
35a34,48 | |
> | |
> if [ -f "/mnt/lm/opt_lm.yml" -a "${LM_ALPHA}" = "0.0" -a "${LM_BETA}" = "0.0" ]; then | |
> export LM_ALPHA=$(cat /mnt/lm/opt_lm.yml | shyaml get-value lm_alpha) | |
> export LM_BETA=$(cat /mnt/lm/opt_lm.yml | shyaml get-value lm_beta) | |
> | |
> if [ -f "/mnt/lm/kenlm.scorer" ]; then | |
> rm /mnt/lm/kenlm.scorer | |
> fi; | |
> | |
> build_lm.sh | |
> fi; | |
> | |
> test.sh | |
> | |
> export.sh | |
Seulement dans STT: test.sh | |
diff '--color=auto' --recursive DeepSpeech/train.sh STT/train.sh | |
5,8c5,8 | |
< pushd $HOME/ds/ | |
< all_train_csv="$(find /mnt/extracted/data/ -type f -name '*train.csv' -printf '%p,' | sed -e 's/,$//g')" | |
< all_dev_csv="$(find /mnt/extracted/data/ -type f -name '*dev.csv' -printf '%p,' | sed -e 's/,$//g')" | |
< all_test_csv="$(find /mnt/extracted/data/ -type f -name '*test.csv' -printf '%p,' | sed -e 's/,$//g')" | |
--- | |
> pushd ${STT_DIR} | |
> all_train_csv="$(find /mnt/extracted/data/ -type f -name '*train.csv' -printf '%p ' | sed -e 's/ $//g')" | |
> all_dev_csv="$(find /mnt/extracted/data/ -type f -name '*dev.csv' -printf '%p ' | sed -e 's/ $//g')" | |
> all_test_csv="$(find /mnt/extracted/data/ -type f -name '*test.csv' -printf '%p ' | sed -e 's/ $//g')" | |
12c12 | |
< # Do not overwrite checkpoint file if model already exist: we will likely | |
--- | |
> # Do not overwrite checkpoint file if model already exist: we will likely | |
14c14 | |
< if [ -f "/transfer-checkpoint/checkpoint" -a ! -f "/mnt/models/output_graph.pb" ]; then | |
--- | |
> if [ -f "/transfer-checkpoint/checkpoint" -a ! -f "/mnt/models/output_graph.tflite" ]; then | |
16c16,19 | |
< cp -a /transfer-checkpoint/* /mnt/checkpoints/ | |
--- | |
> # use --load_checkpoint_dir for transfer learning | |
> LOAD_CHECKPOINT_FROM="--load_checkpoint_dir /transfer-checkpoint --save_checkpoint_dir /mnt/checkpoints" | |
> else | |
> LOAD_CHECKPOINT_FROM="--checkpoint_dir /mnt/checkpoints/" | |
19c22 | |
< EARLY_STOP_FLAG="--early_stop" | |
--- | |
> EARLY_STOP_FLAG="--early_stop true" | |
21c24 | |
< EARLY_STOP_FLAG="--noearly_stop" | |
--- | |
> EARLY_STOP_FLAG="--early_stop false" | |
26c29 | |
< AMP_FLAG="--automatic_mixed_precision True" | |
--- | |
> AMP_FLAG="--automatic_mixed_precision true" | |
29,32c32,34 | |
< # Check metadata existence | |
< if [ -z "$METADATA_AUTHOR" ]; then | |
< echo "Please fill-in metadata informations" | |
< exit 1 | |
--- | |
> SKIP_BATCH_TEST_FLAG="" | |
> if [ "${SKIP_BATCH_TEST}" = "1" ]; then | |
> SKIP_BATCH_TEST_FLAG="--skip_batch_test true" | |
35,43c37,43 | |
< # Ok, assume we have all the metadata now | |
< ALL_METADATA_FLAGS="--export_author_id $METADATA_AUTHOR" | |
< ALL_METADATA_FLAGS="$ALL_METADATA_FLAGS --export_model_version $METADATA_MODEL_VERSION" | |
< ALL_METADATA_FLAGS="$ALL_METADATA_FLAGS --export_contact_info $METADATA_CONTACT_INFO" | |
< ALL_METADATA_FLAGS="$ALL_METADATA_FLAGS --export_license $METADATA_LICENSE" | |
< ALL_METADATA_FLAGS="$ALL_METADATA_FLAGS --export_language $METADATA_LANGUAGE" | |
< ALL_METADATA_FLAGS="$ALL_METADATA_FLAGS --export_min_ds_version $METADATA_MIN_DS_VERSION" | |
< ALL_METADATA_FLAGS="$ALL_METADATA_FLAGS --export_max_ds_version $METADATA_MAX_DS_VERSION" | |
< ALL_METADATA_FLAGS="$ALL_METADATA_FLAGS --export_description $METADATA_DESCRIPTION" | |
--- | |
> # Basic augmentation for data | |
> # TODO: Add use of overlays with noise datasets | |
> # ^ This would require to download and prepare noise data | |
> ALL_AUGMENT_FLAGS="" | |
> if [ "${ENABLE_AUGMENTS}" = "1" ]; then | |
> ${$HOMEDIR}/parse_augments_args.sh | |
> fi; | |
47,49c47,49 | |
< python -u DeepSpeech.py \ | |
< --show_progressbar True \ | |
< --train_cudnn True \ | |
--- | |
> python -m coqui_stt_training.train \ | |
> --show_progressbar true \ | |
> --train_cudnn true \ | |
57,59c57,59 | |
< --train_batch_size ${BATCH_SIZE} \ | |
< --dev_batch_size ${BATCH_SIZE} \ | |
< --test_batch_size ${BATCH_SIZE} \ | |
--- | |
> --train_batch_size ${TRAIN_BATCH_SIZE} \ | |
> --dev_batch_size ${DEV_BATCH_SIZE} \ | |
> --test_batch_size ${TEST_BATCH_SIZE} \ | |
65a66 | |
> --log_level=${LOG_LEVEL} \ | |
67,141c68,70 | |
< --checkpoint_dir /mnt/checkpoints/ | |
< fi; | |
< | |
< if [ ! -f "/mnt/models/test_output.json" ]; then | |
< python -u DeepSpeech.py \ | |
< --show_progressbar True \ | |
< --train_cudnn True \ | |
< ${AMP_FLAG} \ | |
< --alphabet_config_path /mnt/models/alphabet.txt \ | |
< --scorer_path /mnt/lm/kenlm.scorer \ | |
< --test_files ${all_test_csv} \ | |
< --test_batch_size ${BATCH_SIZE} \ | |
< --n_hidden ${N_HIDDEN} \ | |
< --lm_alpha ${LM_ALPHA} \ | |
< --lm_beta ${LM_BETA} \ | |
< --checkpoint_dir /mnt/checkpoints/ \ | |
< --test_output_file /mnt/models/test_output.json | |
< fi; | |
< | |
< if [ ! -f "/mnt/models/output_graph.pb" ]; then | |
< METADATA_MODEL_NAME_FLAG="--export_model_name $METADATA_MODEL_NAME-tensorflow" | |
< python -u DeepSpeech.py \ | |
< --alphabet_config_path /mnt/models/alphabet.txt \ | |
< --scorer_path /mnt/lm/kenlm.scorer \ | |
< --feature_cache /mnt/sources/feature_cache \ | |
< --n_hidden ${N_HIDDEN} \ | |
< --beam_width ${BEAM_WIDTH} \ | |
< --lm_alpha ${LM_ALPHA} \ | |
< --lm_beta ${LM_BETA} \ | |
< --load_evaluate "best" \ | |
< --checkpoint_dir /mnt/checkpoints/ \ | |
< --export_dir /mnt/models/ \ | |
< ${ALL_METADATA_FLAGS} \ | |
< ${METADATA_MODEL_NAME_FLAG} | |
< fi; | |
< | |
< if [ ! -f "/mnt/models/output_graph.tflite" ]; then | |
< METADATA_MODEL_NAME_FLAG="--export_model_name $METADATA_MODEL_NAME-tflite" | |
< python -u DeepSpeech.py \ | |
< --alphabet_config_path /mnt/models/alphabet.txt \ | |
< --scorer_path /mnt/lm/kenlm.scorer \ | |
< --feature_cache /mnt/sources/feature_cache \ | |
< --n_hidden ${N_HIDDEN} \ | |
< --beam_width ${BEAM_WIDTH} \ | |
< --lm_alpha ${LM_ALPHA} \ | |
< --lm_beta ${LM_BETA} \ | |
< --load_evaluate "best" \ | |
< --checkpoint_dir /mnt/checkpoints/ \ | |
< --export_dir /mnt/models/ \ | |
< --export_tflite \ | |
< ${ALL_METADATA_FLAGS} \ | |
< ${METADATA_MODEL_NAME_FLAG} | |
< fi; | |
< | |
< if [ ! -f "/mnt/models/${MODEL_EXPORT_ZIP_LANG}.zip" ]; then | |
< mkdir /mnt/models/${MODEL_EXPORT_ZIP_LANG} || rm /mnt/models/${MODEL_EXPORT_ZIP_LANG}/* | |
< METADATA_MODEL_NAME_FLAG="--export_model_name $METADATA_MODEL_NAME-tflite" | |
< python -u DeepSpeech.py \ | |
< --alphabet_config_path /mnt/models/alphabet.txt \ | |
< --scorer_path /mnt/lm/kenlm.scorer \ | |
< --feature_cache /mnt/sources/feature_cache \ | |
< --n_hidden ${N_HIDDEN} \ | |
< --beam_width ${BEAM_WIDTH} \ | |
< --lm_alpha ${LM_ALPHA} \ | |
< --lm_beta ${LM_BETA} \ | |
< --load_evaluate "best" \ | |
< --checkpoint_dir /mnt/checkpoints/ \ | |
< --export_dir /mnt/models/${MODEL_EXPORT_ZIP_LANG} \ | |
< --export_zip \ | |
< ${ALL_METADATA_FLAGS} \ | |
< ${METADATA_MODEL_NAME_FLAG} | |
< fi; | |
< | |
< if [ ! -f "/mnt/models/output_graph.pbmm" ]; then | |
< ./convert_graphdef_memmapped_format --in_graph=/mnt/models/output_graph.pb --out_graph=/mnt/models/output_graph.pbmm | |
--- | |
> ${LOAD_CHECKPOINT_FROM} \ | |
> ${SKIP_BATCH_TEST_FLAG} \ | |
> ${ALL_AUGMENT_FLAGS} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment