Skip to content

Instantly share code, notes, and snippets.

@wasertech
Last active April 20, 2022 21:42
Show Gist options
  • Save wasertech/4684fabe991718190a5c3245f0c0c187 to your computer and use it in GitHub Desktop.
Save wasertech/4684fabe991718190a5c3245f0c0c187 to your computer and use it in GitHub Desktop.
Still hanging without testing interface from train module.
trainer@1a9ce46df059:~$ TF_CUDNN_RESET_RND_GEN_STATE=1 python -m coqui_stt_training.train --show_progressbar true --train_cudnn true --alphabet_config_path /mnt/test_models/alphabet.txt --scorer_path /mnt/lm/kenlm.scorer --feature_cache /tmp/feature_cache --train_files /mnt/extracted/data/cv-fr/clips/train.csv --dev_files /mnt/extracted/data/cv-fr/clips/dev.csv --train_batch_size 32 --dev_batch_size 32 --n_hidden 2048 --epochs 3 --learning_rate 0.0001 --dropout_rate 0.3 --lm_alpha 0.0 --lm_beta 0.0 --log_level=0 --early_stop true --checkpoint_dir /mnt/test_checkpoints/
I Performing dummy training to check for memory problems.
I If the following process crashes, you likely have batch sizes that are too big for your available system memory (or GPU memory).
D Session opened.
I Loading best validating checkpoint from /mnt/test_checkpoints/best_dev-10
I Loading variable from checkpoint: beta1_power
I Loading variable from checkpoint: beta2_power
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel/Adam
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel/Adam_1
I Loading variable from checkpoint: global_step
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/bias/Adam
I Loading variable from checkpoint: layer_1/bias/Adam_1
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_1/weights/Adam
I Loading variable from checkpoint: layer_1/weights/Adam_1
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/bias/Adam
I Loading variable from checkpoint: layer_2/bias/Adam_1
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_2/weights/Adam
I Loading variable from checkpoint: layer_2/weights/Adam_1
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/bias/Adam
I Loading variable from checkpoint: layer_3/bias/Adam_1
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_3/weights/Adam
I Loading variable from checkpoint: layer_3/weights/Adam_1
I Loading variable from checkpoint: layer_5/bias
I Loading variable from checkpoint: layer_5/bias/Adam
I Loading variable from checkpoint: layer_5/bias/Adam_1
I Loading variable from checkpoint: layer_5/weights
I Loading variable from checkpoint: layer_5/weights/Adam
I Loading variable from checkpoint: layer_5/weights/Adam_1
I Loading variable from checkpoint: layer_6/bias
I Loading variable from checkpoint: layer_6/bias/Adam
I Loading variable from checkpoint: layer_6/bias/Adam_1
I Loading variable from checkpoint: layer_6/weights
I Loading variable from checkpoint: layer_6/weights/Adam
I Loading variable from checkpoint: layer_6/weights/Adam_1
I Loading variable from checkpoint: learning_rate
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 0:00:11 | Steps: 1 | Loss: 374.963043
Epoch 0 | Validation | Elapsed Time: 0:00:02 | Steps: 1 | Loss: 298.163666 | Dataset: /mnt/extracted/data/cv-fr/clips/dev.csv
--------------------------------------------------------------------------------
I FINISHED optimization in 0:00:14.334355
D Session closed.
I Dummy run finished without problems, now starting real training process.
D Session opened.
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 0:00:00 | Steps: 1 | Loss: 374.676666
Epoch 0 | Validation | Elapsed Time: 0:00:39 | Steps: 248 | Loss: 212.753558 | Dataset: /mnt/extracted/data/cv-fr/clips/dev.csv
I Saved new best validating model with loss 212.753558 to: /mnt/test_checkpoints/best_dev-11
--------------------------------------------------------------------------------
Epoch 1 | Training | Elapsed Time: 0:00:00 | Steps: 1 | Loss: 340.513489
Epoch 1 | Validation | Elapsed Time: 0:00:35 | Steps: 248 | Loss: 197.680778 | Dataset: /mnt/extracted/data/cv-fr/clips/dev.csv
I Saved new best validating model with loss 197.680778 to: /mnt/test_checkpoints/best_dev-12
--------------------------------------------------------------------------------
Epoch 2 | Training | Elapsed Time: 0:00:00 | Steps: 1 | Loss: 304.787231
Epoch 2 | Validation | Elapsed Time: 0:00:35 | Steps: 248 | Loss: 203.614475 | Dataset: /mnt/extracted/data/cv-fr/clips/dev.csv
--------------------------------------------------------------------------------
I FINISHED optimization in 0:01:56.509693
D Session closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment