Skip to content

Instantly share code, notes, and snippets.

@wasertech
Created April 21, 2022 13:42
Show Gist options
  • Save wasertech/e6a460532c0c8ee9f9ea4ed06073194f to your computer and use it in GitHub Desktop.
Save wasertech/e6a460532c0c8ee9f9ea4ed06073194f to your computer and use it in GitHub Desktop.
Hangs after only one epoch
trainer@e4916e93eaab:~/stt$ TF_CUDNN_RESET_RND_GEN_STATE=1 python train.py --show_progressbar true --train_cudnn true --alphabet_config_path /mnt/test_models/alphabet.txt --scorer_path /mnt/lm/kenlm.scorer --feature_cache /tmp/feature_cache --train_files /mnt/extracted/data/cv-fr/clips/train.csv --dev_files /mnt/extracted/data/cv-fr/clips/dev.csv --train_batch_size 32 --dev_batch_size 32 --n_hidden 2048 --epochs 1 --learning_rate 0.0001 --dropout_rate 0.3 --lm_alpha 0.0 --lm_beta 0.0 --log_level=0 --early_stop true --checkpoint_dir /mnt/test2_checkpoints/
Using the top level train.py script is deprecated and will be removed in a future release. Instead use: python -m coqui_stt_training.train
I Performing dummy training to check for memory problems.
I If the following process crashes, you likely have batch sizes that are too big for your available system memory (or GPU memory).
D Session opened.
I Loading best validating checkpoint from /mnt/test2_checkpoints/best_dev-2
I Loading variable from checkpoint: beta1_power
I Loading variable from checkpoint: beta2_power
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel/Adam
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel/Adam_1
I Loading variable from checkpoint: global_step
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/bias/Adam
I Loading variable from checkpoint: layer_1/bias/Adam_1
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_1/weights/Adam
I Loading variable from checkpoint: layer_1/weights/Adam_1
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/bias/Adam
I Loading variable from checkpoint: layer_2/bias/Adam_1
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_2/weights/Adam
I Loading variable from checkpoint: layer_2/weights/Adam_1
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/bias/Adam
I Loading variable from checkpoint: layer_3/bias/Adam_1
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_3/weights/Adam
I Loading variable from checkpoint: layer_3/weights/Adam_1
I Loading variable from checkpoint: layer_5/bias
I Loading variable from checkpoint: layer_5/bias/Adam
I Loading variable from checkpoint: layer_5/bias/Adam_1
I Loading variable from checkpoint: layer_5/weights
I Loading variable from checkpoint: layer_5/weights/Adam
I Loading variable from checkpoint: layer_5/weights/Adam_1
I Loading variable from checkpoint: layer_6/bias
I Loading variable from checkpoint: layer_6/bias/Adam
I Loading variable from checkpoint: layer_6/bias/Adam_1
I Loading variable from checkpoint: layer_6/weights
I Loading variable from checkpoint: layer_6/weights/Adam
I Loading variable from checkpoint: layer_6/weights/Adam_1
I Loading variable from checkpoint: learning_rate
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 0:00:03 | Steps: 1 | Loss: 414.328674
Epoch 0 | Validation | Elapsed Time: 0:00:01 | Steps: 1 | Loss: 506.753296 | Dataset: /mnt/extracted/data/cv-fr/clips/dev.csv
--------------------------------------------------------------------------------
I FINISHED optimization in 0:00:05.150581
D Session closed.
I Dummy run finished without problems, now starting real training process.
D Session opened.
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 0:00:00 | Steps: 1 | Loss: 414.328674
Epoch 0 | Validation | Elapsed Time: 0:00:40 | Steps: 248 | Loss: 390.145640 | Dataset: /mnt/extracted/data/cv-fr/clips/dev.csv
I Saved new best validating model with loss 390.145640 to: /mnt/test2_checkpoints/best_dev-3
--------------------------------------------------------------------------------
I FINISHED optimization in 0:00:46.090655
D Session closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment