-
-
Save wasertech/e6a460532c0c8ee9f9ea4ed06073194f to your computer and use it in GitHub Desktop.
Hangs after only one epoch
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
trainer@e4916e93eaab:~/stt$ TF_CUDNN_RESET_RND_GEN_STATE=1 python train.py --show_progressbar true --train_cudnn true --alphabet_config_path /mnt/test_models/alphabet.txt --scorer_path /mnt/lm/kenlm.scorer --feature_cache /tmp/feature_cache --train_files /mnt/extracted/data/cv-fr/clips/train.csv --dev_files /mnt/extracted/data/cv-fr/clips/dev.csv --train_batch_size 32 --dev_batch_size 32 --n_hidden 2048 --epochs 1 --learning_rate 0.0001 --dropout_rate 0.3 --lm_alpha 0.0 --lm_beta 0.0 --log_level=0 --early_stop true --checkpoint_dir /mnt/test2_checkpoints/ | |
Using the top level train.py script is deprecated and will be removed in a future release. Instead use: python -m coqui_stt_training.train | |
I Performing dummy training to check for memory problems. | |
I If the following process crashes, you likely have batch sizes that are too big for your available system memory (or GPU memory). | |
D Session opened. | |
I Loading best validating checkpoint from /mnt/test2_checkpoints/best_dev-2 | |
I Loading variable from checkpoint: beta1_power | |
I Loading variable from checkpoint: beta2_power | |
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel | |
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel/Adam | |
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel/Adam_1 | |
I Loading variable from checkpoint: global_step | |
I Loading variable from checkpoint: layer_1/bias | |
I Loading variable from checkpoint: layer_1/bias/Adam | |
I Loading variable from checkpoint: layer_1/bias/Adam_1 | |
I Loading variable from checkpoint: layer_1/weights | |
I Loading variable from checkpoint: layer_1/weights/Adam | |
I Loading variable from checkpoint: layer_1/weights/Adam_1 | |
I Loading variable from checkpoint: layer_2/bias | |
I Loading variable from checkpoint: layer_2/bias/Adam | |
I Loading variable from checkpoint: layer_2/bias/Adam_1 | |
I Loading variable from checkpoint: layer_2/weights | |
I Loading variable from checkpoint: layer_2/weights/Adam | |
I Loading variable from checkpoint: layer_2/weights/Adam_1 | |
I Loading variable from checkpoint: layer_3/bias | |
I Loading variable from checkpoint: layer_3/bias/Adam | |
I Loading variable from checkpoint: layer_3/bias/Adam_1 | |
I Loading variable from checkpoint: layer_3/weights | |
I Loading variable from checkpoint: layer_3/weights/Adam | |
I Loading variable from checkpoint: layer_3/weights/Adam_1 | |
I Loading variable from checkpoint: layer_5/bias | |
I Loading variable from checkpoint: layer_5/bias/Adam | |
I Loading variable from checkpoint: layer_5/bias/Adam_1 | |
I Loading variable from checkpoint: layer_5/weights | |
I Loading variable from checkpoint: layer_5/weights/Adam | |
I Loading variable from checkpoint: layer_5/weights/Adam_1 | |
I Loading variable from checkpoint: layer_6/bias | |
I Loading variable from checkpoint: layer_6/bias/Adam | |
I Loading variable from checkpoint: layer_6/bias/Adam_1 | |
I Loading variable from checkpoint: layer_6/weights | |
I Loading variable from checkpoint: layer_6/weights/Adam | |
I Loading variable from checkpoint: layer_6/weights/Adam_1 | |
I Loading variable from checkpoint: learning_rate | |
I STARTING Optimization | |
Epoch 0 | Training | Elapsed Time: 0:00:03 | Steps: 1 | Loss: 414.328674 | |
Epoch 0 | Validation | Elapsed Time: 0:00:01 | Steps: 1 | Loss: 506.753296 | Dataset: /mnt/extracted/data/cv-fr/clips/dev.csv | |
-------------------------------------------------------------------------------- | |
I FINISHED optimization in 0:00:05.150581 | |
D Session closed. | |
I Dummy run finished without problems, now starting real training process. | |
D Session opened. | |
I STARTING Optimization | |
Epoch 0 | Training | Elapsed Time: 0:00:00 | Steps: 1 | Loss: 414.328674 | |
Epoch 0 | Validation | Elapsed Time: 0:00:40 | Steps: 248 | Loss: 390.145640 | Dataset: /mnt/extracted/data/cv-fr/clips/dev.csv | |
I Saved new best validating model with loss 390.145640 to: /mnt/test2_checkpoints/best_dev-3 | |
-------------------------------------------------------------------------------- | |
I FINISHED optimization in 0:00:46.090655 | |
D Session closed. | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment