-
-
Save Max-Ryujin/60d6dcc820ddd0e960e753806ceeb9af to your computer and use it in GitHub Desktop.
Log Mel training
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--------------------- Slurm Task Prolog ------------------------ | |
Job ID: 2550071 | |
Job name: ReturnnTrainingJob.UP1hAn0FnLY7.run | |
Host: cn-264 | |
Date: Mi 11. Okt 21:38:53 CEST 2023 | |
User: maximilian.kannen | |
Slurm account: hlt | |
Slurm partition: gpu_11gb | |
Work dir: | |
------------------ | |
Node usage: | |
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) | |
2540953_1 gpu_11gb crnn.cus hwu R 11:13:21 1 cn-264 | |
2550071_1 gpu_11gb ReturnnT maximili R 0:00 1 cn-264 | |
------------------ | |
Show launch script with: | |
sacct -B -j | |
------------------ | |
--------------------- Slurm Task Prolog ------------------------ | |
[2023-10-11 21:38:54,511] INFO: [32mGenerating grammar tables from /usr/local/lib/python3.8/dist-packages/blib2to3/Grammar.txt[0m | |
[2023-10-11 21:38:54,524] INFO: [32mWriting grammar tables to /u/maximilian.kannen/.cache/black/22.3.0/Grammar3.8.10.final.0.pickle[0m | |
[2023-10-11 21:38:54,524] INFO: [32mWriting failed: [Errno 2] No such file or directory: '/u/maximilian.kannen/.cache/black/22.3.0/tmpc54qbzqq'[0m | |
[2023-10-11 21:38:54,525] INFO: [32mGenerating grammar tables from /usr/local/lib/python3.8/dist-packages/blib2to3/PatternGrammar.txt[0m | |
[2023-10-11 21:38:54,526] INFO: [32mWriting grammar tables to /u/maximilian.kannen/.cache/black/22.3.0/PatternGrammar3.8.10.final.0.pickle[0m | |
[2023-10-11 21:38:54,526] INFO: [32mWriting failed: [Errno 2] No such file or directory: '/u/maximilian.kannen/.cache/black/22.3.0/tmpiiu1gfw1'[0m | |
[2023-10-11 21:38:55,054] INFO: [32mStart Job: Job<alias/experiments/switchboard/ctc/feat/train_nn/conformer_bs5k_lgm80_baseline work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7> Task: run[0m | |
[2023-10-11 21:38:55,054] INFO: [32mInputs:[0m | |
[2023-10-11 21:38:55,054] INFO: [32m/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank[0m | |
[2023-10-11 21:38:55,054] INFO: [32m/u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank[0m | |
[2023-10-11 21:38:55,054] INFO: [32m/usr/bin/python3[0m | |
[2023-10-11 21:38:55,054] INFO: [32m/work/asr4/vieting/programs/rasr/20230707/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard[0m | |
[2023-10-11 21:38:55,054] INFO: [32m/u/maximilian.kannen/setups/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.Fzh6DWEkIA5y/output/segments.1[0m | |
[2023-10-11 21:38:55,055] INFO: [32m/u/maximilian.kannen/setups/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.SVlbt6fqP4Jn/output/segments.1[0m | |
[2023-10-11 21:38:55,055] INFO: [32m/u/maximilian.kannen/setups/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1[0m | |
[2023-10-11 21:38:55,057] INFO: [32m/u/maximilian.kannen/setups/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz[0m | |
[2023-10-11 21:38:55,059] INFO: [32m/u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/oggzip/BlissToOggZipJob.lAFM8R9mzLpI/output/out.ogg.zip[0m | |
[2023-10-11 21:38:55,060] INFO: [32m/u/maximilian.kannen/setups/20230406_feat/work/i6_core/text/processing/TailJob.RiSM6fe2XipO/output/out.gz[0m | |
[2023-10-11 21:38:55,061] INFO: [32m/u/maximilian.kannen/setups/20230406_feat/work/i6_core/tools/git/CloneGitRepositoryJob.0CzeTYuWDN93/output/repository[0m | |
[2023-10-11 21:38:55,062] INFO: [32m/u/maximilian.kannen/setups/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml[0m | |
Uname: uname_result(system='Linux', node='cn-264', release='5.15.0-39-generic', version='#42-Ubuntu SMP Thu Jun 9 23:42:32 UTC 2022', machine='x86_64', processor='x86_64') | |
Load: (2.06, 3.44, 6.7) | |
[2023-10-11 21:38:55,063] INFO: [32m------------------------------------------------------------[0m | |
[2023-10-11 21:38:55,063] INFO: [32mStarting subtask for arg id: 0 args: [][0m | |
[2023-10-11 21:38:55,063] INFO: [32m------------------------------------------------------------[0m | |
[2023-10-11 21:38:55,068] INFO: [32mRun time: 0:00:00 CPU: 158.80% RSS: 79MB VMS: 294MB[0m | |
RETURNN starting up, version 1.20231002.092600+git.97a829f9, date/time 2023-10-11-21-38-55 (UTC+0200), pid 487565, cwd /work/asr3/vieting/hiwis/kannen/sisyphus_work_dirs/swb/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/work, Python /usr/bin/python3 | |
RETURNN command line options: ['/u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/returnn.config'] | |
Hostname: cn-264 | |
[2023-10-11 21:39:00,084] INFO: [32mRun time: 0:00:05 CPU: 0.40% RSS: 386MB VMS: 1.56GB[0m | |
TensorFlow: 2.8.0 (unknown) (<not-under-git> in /usr/local/lib/python3.8/dist-packages/tensorflow) | |
Use num_threads=1 (but min 2) via OMP_NUM_THREADS. | |
Setup TF inter and intra global thread pools, num_threads 2, session opts {'log_device_placement': False, 'device_count': {'GPU': 0}, 'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2}. | |
2023-10-11 21:39:00.976696: I tensorflow/core/platform/cpu_feature_guard.cc:152] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
CUDA_VISIBLE_DEVICES is set to '0'. | |
Collecting TensorFlow device list... | |
2023-10-11 21:39:03.572081: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /device:GPU:0 with 10245 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1 | |
Local devices available to TensorFlow: | |
1/2: name: "/device:CPU:0" | |
device_type: "CPU" | |
memory_limit: 268435456 | |
locality { | |
} | |
incarnation: 9795765555376902247 | |
xla_global_id: -1 | |
2/2: name: "/device:GPU:0" | |
device_type: "GPU" | |
memory_limit: 10742726656 | |
locality { | |
bus_id: 1 | |
links { | |
} | |
} | |
incarnation: 1539930335961771747 | |
physical_device_desc: "device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1" | |
xla_global_id: 416903419 | |
Using gpu device 0: NVIDIA GeForce GTX 1080 Ti | |
Hostname 'cn-264', GPU 0, GPU-dev-name 'NVIDIA GeForce GTX 1080 Ti', GPU-memory 10.0GB | |
LOG: connected to ('10.6.100.1', 10321) | |
LOG: destination: /var/tmp/maximilian.kannen/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/returnn/oggzip/BlissToOggZipJob.lAFM8R9mzLpI/output/out.ogg.zip | |
LOG: using existing file | |
Train data: | |
input: 1 x 1 | |
output: {'raw': {'dtype': 'string', 'shape': ()}, 'orth': [256, 1], 'data': [1, 2]} | |
OggZipDataset, sequences: 0, frames: unknown | |
Dev data: | |
OggZipDataset, sequences: 0, frames: unknown | |
[2023-10-11 21:39:05,102] INFO: [32mRun time: 0:00:10 CPU: 0.20% RSS: 1.05GB VMS: 12.70GB[0m | |
Learning-rate-control: loading file learning_rates | |
Setup TF session with options {'log_device_placement': False, 'device_count': {'GPU': 1}} ... | |
2023-10-11 21:39:08.844346: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10245 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1 | |
layer /'data': [B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)] float32 | |
layer /features/'wave_norm': [B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)] float32 | |
layer /features/'stft': [B,T|'⌈(-99+time:var:extern_data:data+-100)/80⌉'[B],F|F'stft:channel'(129)] complex64 | |
layer /features/'abs': [B,T|'⌈(-99+time:var:extern_data:data+-100)/80⌉'[B],F|F'stft:channel'(129)] float32 | |
layer /features/'power': [B,T|'⌈(-99+time:var:extern_data:data+-100)/80⌉'[B],F|F'stft:channel'(129)] float32 | |
layer /features/'mel_filterbank': [B,T|'⌈(-99+time:var:extern_data:data+-100)/80⌉'[B],F|F'mel_filterbank:feature-dense'(80)] float32 | |
layer /features/'log': [B,T|'⌈(-99+time:var:extern_data:data+-100)/80⌉'[B],F|F'mel_filterbank:feature-dense'(80)] float32 | |
layer /features/'log10': [B,T|'⌈(-99+time:var:extern_data:data+-100)/80⌉'[B],F|F'mel_filterbank:feature-dense'(80)] float32 | |
layer /features/'output': [B,T|'⌈(-99+time:var:extern_data:data+-100)/80⌉'[B],F|F'mel_filterbank:feature-dense'(80)] float32 | |
DEPRECATION WARNING: batch_norm masked_time should be specified explicitly | |
This will be disallowed with behavior_version 12. | |
WARNING:tensorflow:From /work/asr3/vieting/hiwis/kannen/sisyphus_work_dirs/swb/i6_core/tools/git/CloneGitRepositoryJob.0CzeTYuWDN93/output/repository/returnn/tf/util/basic.py:1725: calling Ones.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. | |
Instructions for updating: | |
Call initializer instance with the dtype argument instead of passing it to the constructor | |
WARNING:tensorflow:From /work/asr3/vieting/hiwis/kannen/sisyphus_work_dirs/swb/i6_core/tools/git/CloneGitRepositoryJob.0CzeTYuWDN93/output/repository/returnn/tf/util/basic.py:1723: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. | |
Instructions for updating: | |
Call initializer instance with the dtype argument instead of passing it to the constructor | |
layer /'features': [B,T|'⌈(-99+time:var:extern_data:data+-100)/80⌉'[B],F|F'mel_filterbank:feature-dense'(80)] float32 | |
layer /'specaug': [B,T|'⌈(-99+time:var:extern_data:data+-100)/80⌉'[B],F|F'mel_filterbank:feature-dense'(80)] float32 | |
layer /'conv_source': [B,T|'⌈(-99+time:var:extern_data:data+-100)/80⌉'[B],F'mel_filterbank:feature-dense'(80),F|F'conv_source_split_dims1'(1)] float32 | |
DEPRECATION WARNING: Explicitly specify in_spatial_dims when there is more than one spatial dim in the input. | |
This will be disallowed with behavior_version 8. | |
layer /'conv_1': [B,T|'⌈(-99+time:var:extern_data:data+-100)/80⌉'[B],F'mel_filterbank:feature-dense'(80),F|F'conv_1:channel'(32)] float32 | |
layer /'conv_1_pool': [B,T|'⌈(-99+time:var:extern_data:data+-100)/80⌉'[B],'(mel_filterbank:feature-dense)//2'(40),F|F'conv_1:channel'(32)] float32 | |
layer /'conv_2': [B,T|'⌈(-99+time:var:extern_data:data+-100)/160⌉'[B],'(mel_filterbank:feature-dense)//2'(40),F|F'conv_2:channel'(64)] float32 | |
layer /'conv_3': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],'(mel_filterbank:feature-dense)//2'(40),F|F'conv_3:channel'(64)] float32 | |
layer /'conv_merged': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'((mel_filterbank:feature-dense)//2)*conv_3:channel'(2560)] float32 | |
DEPRECATION WARNING: MergeDimsLayer, only keep_order=True is allowed | |
This will be disallowed with behavior_version 6. | |
layer /'input_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'input_linear:feature-dense'(512)] float32 | |
layer /'input_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'input_linear:feature-dense'(512)] float32 | |
layer /'conformer_1_ffmod_1_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'input_linear:feature-dense'(512)] float32 | |
layer /'conformer_1_ffmod_1_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_ffmod_1_linear_swish:feature-dense'(2048)] float32 | |
layer /'conformer_1_ffmod_1_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
[2023-10-11 21:39:10,125] INFO: [32mRun time: 0:00:15 CPU: 0.40% RSS: 1.46GB VMS: 13.12GB[0m | |
layer /'conformer_1_ffmod_1_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_1_ffmod_1_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_1_conv_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_1_conv_mod_pointwise_conv_1': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_conv_mod_pointwise_conv_1:feature-dense'(1024)] float32 | |
layer /'conformer_1_conv_mod_glu': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'(conformer_1_conv_mod_pointwise_conv_1:feature-dense)//2'(512)] float32 | |
layer /'conformer_1_conv_mod_depthwise_conv': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_1_conv_mod_bn': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_1_conv_mod_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_1_conv_mod_pointwise_conv_2': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_1_conv_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_1_conv_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_1_mhsa_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_1_mhsa_mod_relpos_encoding': [T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_mhsa_mod_relpos_encoding_rel_pos_enc_feat'(64)] float32 | |
layer /'conformer_1_mhsa_mod_self_attention': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_1_mhsa_mod_att_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_1_mhsa_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_1_mhsa_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_1_ffmod_2_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_1_ffmod_2_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_ffmod_2_linear_swish:feature-dense'(2048)] float32 | |
layer /'conformer_1_ffmod_2_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_1_ffmod_2_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_1_ffmod_2_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_1_output': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_2_ffmod_1_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_2_ffmod_1_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_ffmod_1_linear_swish:feature-dense'(2048)] float32 | |
layer /'conformer_2_ffmod_1_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_2_ffmod_1_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_2_ffmod_1_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_2_conv_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_2_conv_mod_pointwise_conv_1': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_conv_mod_pointwise_conv_1:feature-dense'(1024)] float32 | |
layer /'conformer_2_conv_mod_glu': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'(conformer_2_conv_mod_pointwise_conv_1:feature-dense)//2'(512)] float32 | |
layer /'conformer_2_conv_mod_depthwise_conv': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_2_conv_mod_bn': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_2_conv_mod_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_2_conv_mod_pointwise_conv_2': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_2_conv_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_2_conv_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_2_mhsa_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_2_mhsa_mod_relpos_encoding': [T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_mhsa_mod_relpos_encoding_rel_pos_enc_feat'(64)] float32 | |
layer /'conformer_2_mhsa_mod_self_attention': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_2_mhsa_mod_att_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_2_mhsa_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_2_mhsa_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_2_ffmod_2_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_2_ffmod_2_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_ffmod_2_linear_swish:feature-dense'(2048)] float32 | |
layer /'conformer_2_ffmod_2_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_2_ffmod_2_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_2_ffmod_2_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_2_output': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_3_ffmod_1_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_3_ffmod_1_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_ffmod_1_linear_swish:feature-dense'(2048)] float32 | |
layer /'conformer_3_ffmod_1_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_3_ffmod_1_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_3_ffmod_1_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_3_conv_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_3_conv_mod_pointwise_conv_1': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_conv_mod_pointwise_conv_1:feature-dense'(1024)] float32 | |
layer /'conformer_3_conv_mod_glu': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'(conformer_3_conv_mod_pointwise_conv_1:feature-dense)//2'(512)] float32 | |
layer /'conformer_3_conv_mod_depthwise_conv': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_3_conv_mod_bn': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_3_conv_mod_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_3_conv_mod_pointwise_conv_2': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_3_conv_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_3_conv_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_3_mhsa_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_3_mhsa_mod_relpos_encoding': [T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_mhsa_mod_relpos_encoding_rel_pos_enc_feat'(64)] float32 | |
layer /'conformer_3_mhsa_mod_self_attention': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_3_mhsa_mod_att_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_3_mhsa_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_3_mhsa_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_3_ffmod_2_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_3_ffmod_2_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_ffmod_2_linear_swish:feature-dense'(2048)] float32 | |
layer /'conformer_3_ffmod_2_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_3_ffmod_2_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_3_ffmod_2_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_3_output': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_4_ffmod_1_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_4_ffmod_1_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_ffmod_1_linear_swish:feature-dense'(2048)] float32 | |
layer /'conformer_4_ffmod_1_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_4_ffmod_1_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_4_ffmod_1_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_4_conv_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_4_conv_mod_pointwise_conv_1': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_conv_mod_pointwise_conv_1:feature-dense'(1024)] float32 | |
layer /'conformer_4_conv_mod_glu': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'(conformer_4_conv_mod_pointwise_conv_1:feature-dense)//2'(512)] float32 | |
layer /'conformer_4_conv_mod_depthwise_conv': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_4_conv_mod_bn': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_4_conv_mod_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_4_conv_mod_pointwise_conv_2': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_4_conv_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_4_conv_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_4_mhsa_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_4_mhsa_mod_relpos_encoding': [T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_mhsa_mod_relpos_encoding_rel_pos_enc_feat'(64)] float32 | |
layer /'conformer_4_mhsa_mod_self_attention': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_4_mhsa_mod_att_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_4_mhsa_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_4_mhsa_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_4_ffmod_2_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_4_ffmod_2_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_ffmod_2_linear_swish:feature-dense'(2048)] float32 | |
layer /'conformer_4_ffmod_2_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_4_ffmod_2_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_4_ffmod_2_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_4_output': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_5_ffmod_1_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_5_ffmod_1_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_ffmod_1_linear_swish:feature-dense'(2048)] float32 | |
layer /'conformer_5_ffmod_1_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_5_ffmod_1_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_5_ffmod_1_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_5_conv_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_5_conv_mod_pointwise_conv_1': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_conv_mod_pointwise_conv_1:feature-dense'(1024)] float32 | |
layer /'conformer_5_conv_mod_glu': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'(conformer_5_conv_mod_pointwise_conv_1:feature-dense)//2'(512)] float32 | |
layer /'conformer_5_conv_mod_depthwise_conv': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_5_conv_mod_bn': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_5_conv_mod_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_5_conv_mod_pointwise_conv_2': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_5_conv_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_5_conv_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_5_mhsa_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_5_mhsa_mod_relpos_encoding': [T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_mhsa_mod_relpos_encoding_rel_pos_enc_feat'(64)] float32 | |
layer /'conformer_5_mhsa_mod_self_attention': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_5_mhsa_mod_att_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_5_mhsa_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_5_mhsa_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_5_ffmod_2_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_5_ffmod_2_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_ffmod_2_linear_swish:feature-dense'(2048)] float32 | |
layer /'conformer_5_ffmod_2_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_5_ffmod_2_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_5_ffmod_2_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_5_output': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_6_ffmod_1_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_6_ffmod_1_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_ffmod_1_linear_swish:feature-dense'(2048)] float32 | |
layer /'conformer_6_ffmod_1_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_6_ffmod_1_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_6_ffmod_1_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_6_conv_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_6_conv_mod_pointwise_conv_1': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_conv_mod_pointwise_conv_1:feature-dense'(1024)] float32 | |
layer /'conformer_6_conv_mod_glu': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'(conformer_6_conv_mod_pointwise_conv_1:feature-dense)//2'(512)] float32 | |
layer /'conformer_6_conv_mod_depthwise_conv': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_6_conv_mod_bn': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_6_conv_mod_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_6_conv_mod_pointwise_conv_2': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_6_conv_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_6_conv_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_6_mhsa_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_6_mhsa_mod_relpos_encoding': [T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_mhsa_mod_relpos_encoding_rel_pos_enc_feat'(64)] float32 | |
layer /'conformer_6_mhsa_mod_self_attention': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_6_mhsa_mod_att_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_6_mhsa_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_6_mhsa_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_6_ffmod_2_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_6_ffmod_2_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_ffmod_2_linear_swish:feature-dense'(2048)] float32 | |
layer /'conformer_6_ffmod_2_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_6_ffmod_2_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_6_ffmod_2_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_6_output': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_7_ffmod_1_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_7_ffmod_1_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_ffmod_1_linear_swish:feature-dense'(2048)] float32 | |
layer /'conformer_7_ffmod_1_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_7_ffmod_1_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_7_ffmod_1_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_7_conv_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_7_conv_mod_pointwise_conv_1': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_conv_mod_pointwise_conv_1:feature-dense'(1024)] float32 | |
layer /'conformer_7_conv_mod_glu': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'(conformer_7_conv_mod_pointwise_conv_1:feature-dense)//2'(512)] float32 | |
layer /'conformer_7_conv_mod_depthwise_conv': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_7_conv_mod_bn': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_7_conv_mod_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_7_conv_mod_pointwise_conv_2': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_7_conv_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_7_conv_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_7_mhsa_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_7_mhsa_mod_relpos_encoding': [T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_mhsa_mod_relpos_encoding_rel_pos_enc_feat'(64)] float32 | |
layer /'conformer_7_mhsa_mod_self_attention': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_7_mhsa_mod_att_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_7_mhsa_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_7_mhsa_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_7_ffmod_2_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_7_ffmod_2_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_ffmod_2_linear_swish:feature-dense'(2048)] float32 | |
layer /'conformer_7_ffmod_2_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_7_ffmod_2_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_7_ffmod_2_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_7_output': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_8_ffmod_1_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_8_ffmod_1_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_ffmod_1_linear_swish:feature-dense'(2048)] float32 | |
layer /'conformer_8_ffmod_1_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_8_ffmod_1_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_8_ffmod_1_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_8_conv_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_8_conv_mod_pointwise_conv_1': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_conv_mod_pointwise_conv_1:feature-dense'(1024)] float32 | |
layer /'conformer_8_conv_mod_glu': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'(conformer_8_conv_mod_pointwise_conv_1:feature-dense)//2'(512)] float32 | |
layer /'conformer_8_conv_mod_depthwise_conv': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_8_conv_mod_bn': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_8_conv_mod_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_8_conv_mod_pointwise_conv_2': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_8_conv_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_8_conv_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_8_mhsa_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_8_mhsa_mod_relpos_encoding': [T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_mhsa_mod_relpos_encoding_rel_pos_enc_feat'(64)] float32 | |
layer /'conformer_8_mhsa_mod_self_attention': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_8_mhsa_mod_att_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_8_mhsa_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_8_mhsa_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_8_ffmod_2_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_8_ffmod_2_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_ffmod_2_linear_swish:feature-dense'(2048)] float32 | |
layer /'conformer_8_ffmod_2_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_8_ffmod_2_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_8_ffmod_2_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_8_output': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_9_ffmod_1_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_9_ffmod_1_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_ffmod_1_linear_swish:feature-dense'(2048)] float32 | |
layer /'conformer_9_ffmod_1_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_9_ffmod_1_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_9_ffmod_1_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_9_conv_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_9_conv_mod_pointwise_conv_1': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_conv_mod_pointwise_conv_1:feature-dense'(1024)] float32 | |
layer /'conformer_9_conv_mod_glu': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'(conformer_9_conv_mod_pointwise_conv_1:feature-dense)//2'(512)] float32 | |
layer /'conformer_9_conv_mod_depthwise_conv': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_9_conv_mod_bn': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_9_conv_mod_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_9_conv_mod_pointwise_conv_2': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_9_conv_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_9_conv_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_9_mhsa_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_9_mhsa_mod_relpos_encoding': [T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_mhsa_mod_relpos_encoding_rel_pos_enc_feat'(64)] float32 | |
layer /'conformer_9_mhsa_mod_self_attention': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_9_mhsa_mod_att_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_9_mhsa_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_9_mhsa_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_9_ffmod_2_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_9_ffmod_2_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_ffmod_2_linear_swish:feature-dense'(2048)] float32 | |
layer /'conformer_9_ffmod_2_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_9_ffmod_2_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_9_ffmod_2_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_9_output': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_10_ffmod_1_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
[2023-10-11 21:39:15,148] INFO: [32mRun time: 0:00:20 CPU: 0.40% RSS: 1.63GB VMS: 13.30GB[0m | |
layer /'conformer_10_ffmod_1_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_ffmod_1_linear_swish:feature-dense'(2048)] float32 | |
layer /'conformer_10_ffmod_1_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_10_ffmod_1_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_10_ffmod_1_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_10_conv_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_10_conv_mod_pointwise_conv_1': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_conv_mod_pointwise_conv_1:feature-dense'(1024)] float32 | |
layer /'conformer_10_conv_mod_glu': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'(conformer_10_conv_mod_pointwise_conv_1:feature-dense)//2'(512)] float32 | |
layer /'conformer_10_conv_mod_depthwise_conv': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_10_conv_mod_bn': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_10_conv_mod_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_10_conv_mod_pointwise_conv_2': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_10_conv_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_10_conv_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_10_mhsa_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_10_mhsa_mod_relpos_encoding': [T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_mhsa_mod_relpos_encoding_rel_pos_enc_feat'(64)] float32 | |
layer /'conformer_10_mhsa_mod_self_attention': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_10_mhsa_mod_att_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_10_mhsa_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_10_mhsa_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_10_ffmod_2_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_10_ffmod_2_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_ffmod_2_linear_swish:feature-dense'(2048)] float32 | |
layer /'conformer_10_ffmod_2_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_10_ffmod_2_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_10_ffmod_2_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_10_output': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_11_ffmod_1_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_11_ffmod_1_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_ffmod_1_linear_swish:feature-dense'(2048)] float32 | |
layer /'conformer_11_ffmod_1_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_11_ffmod_1_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_11_ffmod_1_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_11_conv_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_11_conv_mod_pointwise_conv_1': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_conv_mod_pointwise_conv_1:feature-dense'(1024)] float32 | |
layer /'conformer_11_conv_mod_glu': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'(conformer_11_conv_mod_pointwise_conv_1:feature-dense)//2'(512)] float32 | |
layer /'conformer_11_conv_mod_depthwise_conv': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_11_conv_mod_bn': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_11_conv_mod_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_11_conv_mod_pointwise_conv_2': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_11_conv_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_11_conv_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_11_mhsa_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_11_mhsa_mod_relpos_encoding': [T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_mhsa_mod_relpos_encoding_rel_pos_enc_feat'(64)] float32 | |
layer /'conformer_11_mhsa_mod_self_attention': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_11_mhsa_mod_att_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_11_mhsa_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_11_mhsa_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_11_ffmod_2_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_11_ffmod_2_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_ffmod_2_linear_swish:feature-dense'(2048)] float32 | |
layer /'conformer_11_ffmod_2_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_11_ffmod_2_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_11_ffmod_2_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_11_output': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_12_ffmod_1_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_12_ffmod_1_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_ffmod_1_linear_swish:feature-dense'(2048)] float32 | |
layer /'conformer_12_ffmod_1_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_12_ffmod_1_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_12_ffmod_1_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_12_conv_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_ffmod_1_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_12_conv_mod_pointwise_conv_1': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_conv_mod_pointwise_conv_1:feature-dense'(1024)] float32 | |
layer /'conformer_12_conv_mod_glu': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'(conformer_12_conv_mod_pointwise_conv_1:feature-dense)//2'(512)] float32 | |
layer /'conformer_12_conv_mod_depthwise_conv': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_12_conv_mod_bn': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_12_conv_mod_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_12_conv_mod_pointwise_conv_2': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_12_conv_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_12_conv_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_12_mhsa_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_conv_mod_depthwise_conv:channel'(512)] float32 | |
layer /'conformer_12_mhsa_mod_relpos_encoding': [T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_mhsa_mod_relpos_encoding_rel_pos_enc_feat'(64)] float32 | |
layer /'conformer_12_mhsa_mod_self_attention': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_12_mhsa_mod_att_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_12_mhsa_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_12_mhsa_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_12_ffmod_2_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_mhsa_mod_self_attention_self_att_feat'(512)] float32 | |
layer /'conformer_12_ffmod_2_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_ffmod_2_linear_swish:feature-dense'(2048)] float32 | |
layer /'conformer_12_ffmod_2_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_12_ffmod_2_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_12_ffmod_2_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'conformer_12_output': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
layer /'encoder': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_ffmod_2_dropout_linear:feature-dense'(512)] float32 | |
2023-10-11 21:39:17.064803: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10245 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1 | |
layer /'output': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'output:feature-dense'(88)] float32 | |
WARNING:tensorflow:From /work/asr3/vieting/hiwis/kannen/sisyphus_work_dirs/swb/i6_core/tools/git/CloneGitRepositoryJob.0CzeTYuWDN93/output/repository/returnn/tf/sprint.py:54: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version. | |
Instructions for updating: | |
tf.py_func is deprecated in TF V2. Instead, there are two | |
options available in V2. | |
- tf.py_function takes a python function which manipulates tf eager | |
tensors instead of numpy arrays. It's easy to convert a tf eager tensor to | |
an ndarray (just call tensor.numpy()) but having access to eager tensors | |
means `tf.py_function`s can use accelerators such as GPUs as well as | |
being differentiable using a gradient tape. | |
- tf.numpy_function maintains the semantics of the deprecated tf.py_func | |
(it is not differentiable, and manipulates numpy arrays). It drops the | |
stateful argument making all functions stateful. | |
Network layer topology: | |
extern data: data: Tensor{[B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)]}, seq_tag: Tensor{[B?], dtype='string'} | |
used data keys: ['data', 'seq_tag'] | |
layers: | |
layer batch_norm 'conformer_10_conv_mod_bn' #: 512 | |
layer conv 'conformer_10_conv_mod_depthwise_conv' #: 512 | |
layer copy 'conformer_10_conv_mod_dropout' #: 512 | |
layer gating 'conformer_10_conv_mod_glu' #: 512 | |
layer layer_norm 'conformer_10_conv_mod_ln' #: 512 | |
layer linear 'conformer_10_conv_mod_pointwise_conv_1' #: 1024 | |
layer linear 'conformer_10_conv_mod_pointwise_conv_2' #: 512 | |
layer combine 'conformer_10_conv_mod_res_add' #: 512 | |
layer activation 'conformer_10_conv_mod_swish' #: 512 | |
layer copy 'conformer_10_ffmod_1_dropout' #: 512 | |
layer linear 'conformer_10_ffmod_1_dropout_linear' #: 512 | |
layer eval 'conformer_10_ffmod_1_half_res_add' #: 512 | |
layer linear 'conformer_10_ffmod_1_linear_swish' #: 2048 | |
layer layer_norm 'conformer_10_ffmod_1_ln' #: 512 | |
layer copy 'conformer_10_ffmod_2_dropout' #: 512 | |
layer linear 'conformer_10_ffmod_2_dropout_linear' #: 512 | |
layer eval 'conformer_10_ffmod_2_half_res_add' #: 512 | |
layer linear 'conformer_10_ffmod_2_linear_swish' #: 2048 | |
layer layer_norm 'conformer_10_ffmod_2_ln' #: 512 | |
layer linear 'conformer_10_mhsa_mod_att_linear' #: 512 | |
layer copy 'conformer_10_mhsa_mod_dropout' #: 512 | |
layer layer_norm 'conformer_10_mhsa_mod_ln' #: 512 | |
layer relative_positional_encoding 'conformer_10_mhsa_mod_relpos_encoding' #: 64 | |
layer combine 'conformer_10_mhsa_mod_res_add' #: 512 | |
layer self_attention 'conformer_10_mhsa_mod_self_attention' #: 512 | |
layer layer_norm 'conformer_10_output' #: 512 | |
layer batch_norm 'conformer_11_conv_mod_bn' #: 512 | |
layer conv 'conformer_11_conv_mod_depthwise_conv' #: 512 | |
layer copy 'conformer_11_conv_mod_dropout' #: 512 | |
layer gating 'conformer_11_conv_mod_glu' #: 512 | |
layer layer_norm 'conformer_11_conv_mod_ln' #: 512 | |
layer linear 'conformer_11_conv_mod_pointwise_conv_1' #: 1024 | |
layer linear 'conformer_11_conv_mod_pointwise_conv_2' #: 512 | |
layer combine 'conformer_11_conv_mod_res_add' #: 512 | |
layer activation 'conformer_11_conv_mod_swish' #: 512 | |
layer copy 'conformer_11_ffmod_1_dropout' #: 512 | |
layer linear 'conformer_11_ffmod_1_dropout_linear' #: 512 | |
layer eval 'conformer_11_ffmod_1_half_res_add' #: 512 | |
layer linear 'conformer_11_ffmod_1_linear_swish' #: 2048 | |
layer layer_norm 'conformer_11_ffmod_1_ln' #: 512 | |
layer copy 'conformer_11_ffmod_2_dropout' #: 512 | |
layer linear 'conformer_11_ffmod_2_dropout_linear' #: 512 | |
layer eval 'conformer_11_ffmod_2_half_res_add' #: 512 | |
layer linear 'conformer_11_ffmod_2_linear_swish' #: 2048 | |
layer layer_norm 'conformer_11_ffmod_2_ln' #: 512 | |
layer linear 'conformer_11_mhsa_mod_att_linear' #: 512 | |
layer copy 'conformer_11_mhsa_mod_dropout' #: 512 | |
layer layer_norm 'conformer_11_mhsa_mod_ln' #: 512 | |
layer relative_positional_encoding 'conformer_11_mhsa_mod_relpos_encoding' #: 64 | |
layer combine 'conformer_11_mhsa_mod_res_add' #: 512 | |
layer self_attention 'conformer_11_mhsa_mod_self_attention' #: 512 | |
layer layer_norm 'conformer_11_output' #: 512 | |
layer batch_norm 'conformer_12_conv_mod_bn' #: 512 | |
layer conv 'conformer_12_conv_mod_depthwise_conv' #: 512 | |
layer copy 'conformer_12_conv_mod_dropout' #: 512 | |
layer gating 'conformer_12_conv_mod_glu' #: 512 | |
layer layer_norm 'conformer_12_conv_mod_ln' #: 512 | |
layer linear 'conformer_12_conv_mod_pointwise_conv_1' #: 1024 | |
layer linear 'conformer_12_conv_mod_pointwise_conv_2' #: 512 | |
layer combine 'conformer_12_conv_mod_res_add' #: 512 | |
layer activation 'conformer_12_conv_mod_swish' #: 512 | |
layer copy 'conformer_12_ffmod_1_dropout' #: 512 | |
layer linear 'conformer_12_ffmod_1_dropout_linear' #: 512 | |
layer eval 'conformer_12_ffmod_1_half_res_add' #: 512 | |
layer linear 'conformer_12_ffmod_1_linear_swish' #: 2048 | |
layer layer_norm 'conformer_12_ffmod_1_ln' #: 512 | |
layer copy 'conformer_12_ffmod_2_dropout' #: 512 | |
layer linear 'conformer_12_ffmod_2_dropout_linear' #: 512 | |
layer eval 'conformer_12_ffmod_2_half_res_add' #: 512 | |
layer linear 'conformer_12_ffmod_2_linear_swish' #: 2048 | |
layer layer_norm 'conformer_12_ffmod_2_ln' #: 512 | |
layer linear 'conformer_12_mhsa_mod_att_linear' #: 512 | |
layer copy 'conformer_12_mhsa_mod_dropout' #: 512 | |
layer layer_norm 'conformer_12_mhsa_mod_ln' #: 512 | |
layer relative_positional_encoding 'conformer_12_mhsa_mod_relpos_encoding' #: 64 | |
layer combine 'conformer_12_mhsa_mod_res_add' #: 512 | |
layer self_attention 'conformer_12_mhsa_mod_self_attention' #: 512 | |
layer layer_norm 'conformer_12_output' #: 512 | |
layer batch_norm 'conformer_1_conv_mod_bn' #: 512 | |
layer conv 'conformer_1_conv_mod_depthwise_conv' #: 512 | |
layer copy 'conformer_1_conv_mod_dropout' #: 512 | |
layer gating 'conformer_1_conv_mod_glu' #: 512 | |
layer layer_norm 'conformer_1_conv_mod_ln' #: 512 | |
layer linear 'conformer_1_conv_mod_pointwise_conv_1' #: 1024 | |
layer linear 'conformer_1_conv_mod_pointwise_conv_2' #: 512 | |
layer combine 'conformer_1_conv_mod_res_add' #: 512 | |
layer activation 'conformer_1_conv_mod_swish' #: 512 | |
layer copy 'conformer_1_ffmod_1_dropout' #: 512 | |
layer linear 'conformer_1_ffmod_1_dropout_linear' #: 512 | |
layer eval 'conformer_1_ffmod_1_half_res_add' #: 512 | |
layer linear 'conformer_1_ffmod_1_linear_swish' #: 2048 | |
layer layer_norm 'conformer_1_ffmod_1_ln' #: 512 | |
layer copy 'conformer_1_ffmod_2_dropout' #: 512 | |
layer linear 'conformer_1_ffmod_2_dropout_linear' #: 512 | |
layer eval 'conformer_1_ffmod_2_half_res_add' #: 512 | |
layer linear 'conformer_1_ffmod_2_linear_swish' #: 2048 | |
layer layer_norm 'conformer_1_ffmod_2_ln' #: 512 | |
layer linear 'conformer_1_mhsa_mod_att_linear' #: 512 | |
layer copy 'conformer_1_mhsa_mod_dropout' #: 512 | |
layer layer_norm 'conformer_1_mhsa_mod_ln' #: 512 | |
layer relative_positional_encoding 'conformer_1_mhsa_mod_relpos_encoding' #: 64 | |
layer combine 'conformer_1_mhsa_mod_res_add' #: 512 | |
layer self_attention 'conformer_1_mhsa_mod_self_attention' #: 512 | |
layer layer_norm 'conformer_1_output' #: 512 | |
layer batch_norm 'conformer_2_conv_mod_bn' #: 512 | |
layer conv 'conformer_2_conv_mod_depthwise_conv' #: 512 | |
layer copy 'conformer_2_conv_mod_dropout' #: 512 | |
layer gating 'conformer_2_conv_mod_glu' #: 512 | |
layer layer_norm 'conformer_2_conv_mod_ln' #: 512 | |
layer linear 'conformer_2_conv_mod_pointwise_conv_1' #: 1024 | |
layer linear 'conformer_2_conv_mod_pointwise_conv_2' #: 512 | |
layer combine 'conformer_2_conv_mod_res_add' #: 512 | |
layer activation 'conformer_2_conv_mod_swish' #: 512 | |
layer copy 'conformer_2_ffmod_1_dropout' #: 512 | |
layer linear 'conformer_2_ffmod_1_dropout_linear' #: 512 | |
layer eval 'conformer_2_ffmod_1_half_res_add' #: 512 | |
layer linear 'conformer_2_ffmod_1_linear_swish' #: 2048 | |
layer layer_norm 'conformer_2_ffmod_1_ln' #: 512 | |
layer copy 'conformer_2_ffmod_2_dropout' #: 512 | |
layer linear 'conformer_2_ffmod_2_dropout_linear' #: 512 | |
layer eval 'conformer_2_ffmod_2_half_res_add' #: 512 | |
layer linear 'conformer_2_ffmod_2_linear_swish' #: 2048 | |
layer layer_norm 'conformer_2_ffmod_2_ln' #: 512 | |
layer linear 'conformer_2_mhsa_mod_att_linear' #: 512 | |
layer copy 'conformer_2_mhsa_mod_dropout' #: 512 | |
layer layer_norm 'conformer_2_mhsa_mod_ln' #: 512 | |
layer relative_positional_encoding 'conformer_2_mhsa_mod_relpos_encoding' #: 64 | |
layer combine 'conformer_2_mhsa_mod_res_add' #: 512 | |
layer self_attention 'conformer_2_mhsa_mod_self_attention' #: 512 | |
layer layer_norm 'conformer_2_output' #: 512 | |
layer batch_norm 'conformer_3_conv_mod_bn' #: 512 | |
layer conv 'conformer_3_conv_mod_depthwise_conv' #: 512 | |
layer copy 'conformer_3_conv_mod_dropout' #: 512 | |
layer gating 'conformer_3_conv_mod_glu' #: 512 | |
layer layer_norm 'conformer_3_conv_mod_ln' #: 512 | |
layer linear 'conformer_3_conv_mod_pointwise_conv_1' #: 1024 | |
layer linear 'conformer_3_conv_mod_pointwise_conv_2' #: 512 | |
layer combine 'conformer_3_conv_mod_res_add' #: 512 | |
layer activation 'conformer_3_conv_mod_swish' #: 512 | |
layer copy 'conformer_3_ffmod_1_dropout' #: 512 | |
layer linear 'conformer_3_ffmod_1_dropout_linear' #: 512 | |
layer eval 'conformer_3_ffmod_1_half_res_add' #: 512 | |
layer linear 'conformer_3_ffmod_1_linear_swish' #: 2048 | |
layer layer_norm 'conformer_3_ffmod_1_ln' #: 512 | |
layer copy 'conformer_3_ffmod_2_dropout' #: 512 | |
layer linear 'conformer_3_ffmod_2_dropout_linear' #: 512 | |
layer eval 'conformer_3_ffmod_2_half_res_add' #: 512 | |
layer linear 'conformer_3_ffmod_2_linear_swish' #: 2048 | |
layer layer_norm 'conformer_3_ffmod_2_ln' #: 512 | |
layer linear 'conformer_3_mhsa_mod_att_linear' #: 512 | |
layer copy 'conformer_3_mhsa_mod_dropout' #: 512 | |
layer layer_norm 'conformer_3_mhsa_mod_ln' #: 512 | |
layer relative_positional_encoding 'conformer_3_mhsa_mod_relpos_encoding' #: 64 | |
layer combine 'conformer_3_mhsa_mod_res_add' #: 512 | |
layer self_attention 'conformer_3_mhsa_mod_self_attention' #: 512 | |
layer layer_norm 'conformer_3_output' #: 512 | |
layer batch_norm 'conformer_4_conv_mod_bn' #: 512 | |
layer conv 'conformer_4_conv_mod_depthwise_conv' #: 512 | |
layer copy 'conformer_4_conv_mod_dropout' #: 512 | |
layer gating 'conformer_4_conv_mod_glu' #: 512 | |
layer layer_norm 'conformer_4_conv_mod_ln' #: 512 | |
layer linear 'conformer_4_conv_mod_pointwise_conv_1' #: 1024 | |
layer linear 'conformer_4_conv_mod_pointwise_conv_2' #: 512 | |
layer combine 'conformer_4_conv_mod_res_add' #: 512 | |
layer activation 'conformer_4_conv_mod_swish' #: 512 | |
layer copy 'conformer_4_ffmod_1_dropout' #: 512 | |
layer linear 'conformer_4_ffmod_1_dropout_linear' #: 512 | |
layer eval 'conformer_4_ffmod_1_half_res_add' #: 512 | |
layer linear 'conformer_4_ffmod_1_linear_swish' #: 2048 | |
layer layer_norm 'conformer_4_ffmod_1_ln' #: 512 | |
layer copy 'conformer_4_ffmod_2_dropout' #: 512 | |
layer linear 'conformer_4_ffmod_2_dropout_linear' #: 512 | |
layer eval 'conformer_4_ffmod_2_half_res_add' #: 512 | |
layer linear 'conformer_4_ffmod_2_linear_swish' #: 2048 | |
layer layer_norm 'conformer_4_ffmod_2_ln' #: 512 | |
layer linear 'conformer_4_mhsa_mod_att_linear' #: 512 | |
layer copy 'conformer_4_mhsa_mod_dropout' #: 512 | |
layer layer_norm 'conformer_4_mhsa_mod_ln' #: 512 | |
layer relative_positional_encoding 'conformer_4_mhsa_mod_relpos_encoding' #: 64 | |
layer combine 'conformer_4_mhsa_mod_res_add' #: 512 | |
layer self_attention 'conformer_4_mhsa_mod_self_attention' #: 512 | |
layer layer_norm 'conformer_4_output' #: 512 | |
layer batch_norm 'conformer_5_conv_mod_bn' #: 512 | |
layer conv 'conformer_5_conv_mod_depthwise_conv' #: 512 | |
layer copy 'conformer_5_conv_mod_dropout' #: 512 | |
layer gating 'conformer_5_conv_mod_glu' #: 512 | |
layer layer_norm 'conformer_5_conv_mod_ln' #: 512 | |
layer linear 'conformer_5_conv_mod_pointwise_conv_1' #: 1024 | |
layer linear 'conformer_5_conv_mod_pointwise_conv_2' #: 512 | |
layer combine 'conformer_5_conv_mod_res_add' #: 512 | |
layer activation 'conformer_5_conv_mod_swish' #: 512 | |
layer copy 'conformer_5_ffmod_1_dropout' #: 512 | |
layer linear 'conformer_5_ffmod_1_dropout_linear' #: 512 | |
layer eval 'conformer_5_ffmod_1_half_res_add' #: 512 | |
layer linear 'conformer_5_ffmod_1_linear_swish' #: 2048 | |
layer layer_norm 'conformer_5_ffmod_1_ln' #: 512 | |
layer copy 'conformer_5_ffmod_2_dropout' #: 512 | |
layer linear 'conformer_5_ffmod_2_dropout_linear' #: 512 | |
layer eval 'conformer_5_ffmod_2_half_res_add' #: 512 | |
layer linear 'conformer_5_ffmod_2_linear_swish' #: 2048 | |
layer layer_norm 'conformer_5_ffmod_2_ln' #: 512 | |
layer linear 'conformer_5_mhsa_mod_att_linear' #: 512 | |
layer copy 'conformer_5_mhsa_mod_dropout' #: 512 | |
layer layer_norm 'conformer_5_mhsa_mod_ln' #: 512 | |
layer relative_positional_encoding 'conformer_5_mhsa_mod_relpos_encoding' #: 64 | |
layer combine 'conformer_5_mhsa_mod_res_add' #: 512 | |
layer self_attention 'conformer_5_mhsa_mod_self_attention' #: 512 | |
layer layer_norm 'conformer_5_output' #: 512 | |
layer batch_norm 'conformer_6_conv_mod_bn' #: 512 | |
layer conv 'conformer_6_conv_mod_depthwise_conv' #: 512 | |
layer copy 'conformer_6_conv_mod_dropout' #: 512 | |
layer gating 'conformer_6_conv_mod_glu' #: 512 | |
layer layer_norm 'conformer_6_conv_mod_ln' #: 512 | |
layer linear 'conformer_6_conv_mod_pointwise_conv_1' #: 1024 | |
layer linear 'conformer_6_conv_mod_pointwise_conv_2' #: 512 | |
layer combine 'conformer_6_conv_mod_res_add' #: 512 | |
layer activation 'conformer_6_conv_mod_swish' #: 512 | |
layer copy 'conformer_6_ffmod_1_dropout' #: 512 | |
layer linear 'conformer_6_ffmod_1_dropout_linear' #: 512 | |
layer eval 'conformer_6_ffmod_1_half_res_add' #: 512 | |
layer linear 'conformer_6_ffmod_1_linear_swish' #: 2048 | |
layer layer_norm 'conformer_6_ffmod_1_ln' #: 512 | |
layer copy 'conformer_6_ffmod_2_dropout' #: 512 | |
layer linear 'conformer_6_ffmod_2_dropout_linear' #: 512 | |
layer eval 'conformer_6_ffmod_2_half_res_add' #: 512 | |
layer linear 'conformer_6_ffmod_2_linear_swish' #: 2048 | |
layer layer_norm 'conformer_6_ffmod_2_ln' #: 512 | |
layer linear 'conformer_6_mhsa_mod_att_linear' #: 512 | |
layer copy 'conformer_6_mhsa_mod_dropout' #: 512 | |
layer layer_norm 'conformer_6_mhsa_mod_ln' #: 512 | |
layer relative_positional_encoding 'conformer_6_mhsa_mod_relpos_encoding' #: 64 | |
layer combine 'conformer_6_mhsa_mod_res_add' #: 512 | |
layer self_attention 'conformer_6_mhsa_mod_self_attention' #: 512 | |
layer layer_norm 'conformer_6_output' #: 512 | |
layer batch_norm 'conformer_7_conv_mod_bn' #: 512 | |
layer conv 'conformer_7_conv_mod_depthwise_conv' #: 512 | |
layer copy 'conformer_7_conv_mod_dropout' #: 512 | |
layer gating 'conformer_7_conv_mod_glu' #: 512 | |
layer layer_norm 'conformer_7_conv_mod_ln' #: 512 | |
layer linear 'conformer_7_conv_mod_pointwise_conv_1' #: 1024 | |
layer linear 'conformer_7_conv_mod_pointwise_conv_2' #: 512 | |
layer combine 'conformer_7_conv_mod_res_add' #: 512 | |
layer activation 'conformer_7_conv_mod_swish' #: 512 | |
layer copy 'conformer_7_ffmod_1_dropout' #: 512 | |
layer linear 'conformer_7_ffmod_1_dropout_linear' #: 512 | |
layer eval 'conformer_7_ffmod_1_half_res_add' #: 512 | |
layer linear 'conformer_7_ffmod_1_linear_swish' #: 2048 | |
layer layer_norm 'conformer_7_ffmod_1_ln' #: 512 | |
layer copy 'conformer_7_ffmod_2_dropout' #: 512 | |
layer linear 'conformer_7_ffmod_2_dropout_linear' #: 512 | |
layer eval 'conformer_7_ffmod_2_half_res_add' #: 512 | |
layer linear 'conformer_7_ffmod_2_linear_swish' #: 2048 | |
layer layer_norm 'conformer_7_ffmod_2_ln' #: 512 | |
layer linear 'conformer_7_mhsa_mod_att_linear' #: 512 | |
layer copy 'conformer_7_mhsa_mod_dropout' #: 512 | |
layer layer_norm 'conformer_7_mhsa_mod_ln' #: 512 | |
layer relative_positional_encoding 'conformer_7_mhsa_mod_relpos_encoding' #: 64 | |
layer combine 'conformer_7_mhsa_mod_res_add' #: 512 | |
layer self_attention 'conformer_7_mhsa_mod_self_attention' #: 512 | |
layer layer_norm 'conformer_7_output' #: 512 | |
layer batch_norm 'conformer_8_conv_mod_bn' #: 512 | |
layer conv 'conformer_8_conv_mod_depthwise_conv' #: 512 | |
layer copy 'conformer_8_conv_mod_dropout' #: 512 | |
layer gating 'conformer_8_conv_mod_glu' #: 512 | |
layer layer_norm 'conformer_8_conv_mod_ln' #: 512 | |
layer linear 'conformer_8_conv_mod_pointwise_conv_1' #: 1024 | |
layer linear 'conformer_8_conv_mod_pointwise_conv_2' #: 512 | |
layer combine 'conformer_8_conv_mod_res_add' #: 512 | |
layer activation 'conformer_8_conv_mod_swish' #: 512 | |
layer copy 'conformer_8_ffmod_1_dropout' #: 512 | |
layer linear 'conformer_8_ffmod_1_dropout_linear' #: 512 | |
layer eval 'conformer_8_ffmod_1_half_res_add' #: 512 | |
layer linear 'conformer_8_ffmod_1_linear_swish' #: 2048 | |
layer layer_norm 'conformer_8_ffmod_1_ln' #: 512 | |
layer copy 'conformer_8_ffmod_2_dropout' #: 512 | |
layer linear 'conformer_8_ffmod_2_dropout_linear' #: 512 | |
layer eval 'conformer_8_ffmod_2_half_res_add' #: 512 | |
layer linear 'conformer_8_ffmod_2_linear_swish' #: 2048 | |
layer layer_norm 'conformer_8_ffmod_2_ln' #: 512 | |
layer linear 'conformer_8_mhsa_mod_att_linear' #: 512 | |
layer copy 'conformer_8_mhsa_mod_dropout' #: 512 | |
layer layer_norm 'conformer_8_mhsa_mod_ln' #: 512 | |
layer relative_positional_encoding 'conformer_8_mhsa_mod_relpos_encoding' #: 64 | |
layer combine 'conformer_8_mhsa_mod_res_add' #: 512 | |
layer self_attention 'conformer_8_mhsa_mod_self_attention' #: 512 | |
layer layer_norm 'conformer_8_output' #: 512 | |
layer batch_norm 'conformer_9_conv_mod_bn' #: 512 | |
layer conv 'conformer_9_conv_mod_depthwise_conv' #: 512 | |
layer copy 'conformer_9_conv_mod_dropout' #: 512 | |
layer gating 'conformer_9_conv_mod_glu' #: 512 | |
layer layer_norm 'conformer_9_conv_mod_ln' #: 512 | |
layer linear 'conformer_9_conv_mod_pointwise_conv_1' #: 1024 | |
layer linear 'conformer_9_conv_mod_pointwise_conv_2' #: 512 | |
layer combine 'conformer_9_conv_mod_res_add' #: 512 | |
layer activation 'conformer_9_conv_mod_swish' #: 512 | |
layer copy 'conformer_9_ffmod_1_dropout' #: 512 | |
layer linear 'conformer_9_ffmod_1_dropout_linear' #: 512 | |
layer eval 'conformer_9_ffmod_1_half_res_add' #: 512 | |
layer linear 'conformer_9_ffmod_1_linear_swish' #: 2048 | |
layer layer_norm 'conformer_9_ffmod_1_ln' #: 512 | |
layer copy 'conformer_9_ffmod_2_dropout' #: 512 | |
layer linear 'conformer_9_ffmod_2_dropout_linear' #: 512 | |
layer eval 'conformer_9_ffmod_2_half_res_add' #: 512 | |
layer linear 'conformer_9_ffmod_2_linear_swish' #: 2048 | |
layer layer_norm 'conformer_9_ffmod_2_ln' #: 512 | |
layer linear 'conformer_9_mhsa_mod_att_linear' #: 512 | |
layer copy 'conformer_9_mhsa_mod_dropout' #: 512 | |
layer layer_norm 'conformer_9_mhsa_mod_ln' #: 512 | |
layer relative_positional_encoding 'conformer_9_mhsa_mod_relpos_encoding' #: 64 | |
layer combine 'conformer_9_mhsa_mod_res_add' #: 512 | |
layer self_attention 'conformer_9_mhsa_mod_self_attention' #: 512 | |
layer layer_norm 'conformer_9_output' #: 512 | |
layer conv 'conv_1' #: 32 | |
layer pool 'conv_1_pool' #: 32 | |
layer conv 'conv_2' #: 64 | |
layer conv 'conv_3' #: 64 | |
layer merge_dims 'conv_merged' #: 2560 | |
layer split_dims 'conv_source' #: 1 | |
layer source 'data' #: 1 | |
layer copy 'encoder' #: 512 | |
layer subnetwork 'features' #: 80 | |
layer activation 'features/abs' #: 129 | |
layer activation 'features/log' #: 80 | |
layer eval 'features/log10' #: 80 | |
layer mel_filterbank 'features/mel_filterbank' #: 80 | |
layer batch_norm 'features/output' #: 80 | |
layer eval 'features/power' #: 129 | |
layer stft 'features/stft' #: 129 | |
layer norm 'features/wave_norm' #: 1 | |
layer copy 'input_dropout' #: 512 | |
layer linear 'input_linear' #: 512 | |
layer softmax 'output' #: 88 | |
layer eval 'specaug' #: 80 | |
net params #: 74182234 | |
net trainable params: [<tf.Variable 'conformer_10_conv_mod_bn/batch_norm/conformer_10_conv_mod_bn_conformer_10_conv_mod_bn_output_beta:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_10_conv_mod_bn/batch_norm/conformer_10_conv_mod_bn_conformer_10_conv_mod_bn_output_gamma:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_10_conv_mod_depthwise_conv/W:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'conformer_10_conv_mod_depthwise_conv/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_10_conv_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_10_conv_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_10_conv_mod_pointwise_conv_1/W:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'conformer_10_conv_mod_pointwise_conv_1/b:0' shape=(1024,) dtype=float32>, <tf.Variable 'conformer_10_conv_mod_pointwise_conv_2/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_10_conv_mod_pointwise_conv_2/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_10_ffmod_1_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_10_ffmod_1_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_10_ffmod_1_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_10_ffmod_1_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_10_ffmod_1_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_10_ffmod_1_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_10_ffmod_2_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_10_ffmod_2_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_10_ffmod_2_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_10_ffmod_2_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_10_ffmod_2_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_10_ffmod_2_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_10_mhsa_mod_att_linear/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_10_mhsa_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_10_mhsa_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_10_mhsa_mod_relpos_encoding/encoding_matrix:0' shape=(65, 64) dtype=float32>, <tf.Variable 'conformer_10_mhsa_mod_self_attention/QKV:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'conformer_10_output/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_10_output/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_11_conv_mod_bn/batch_norm/conformer_11_conv_mod_bn_conformer_11_conv_mod_bn_output_beta:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_11_conv_mod_bn/batch_norm/conformer_11_conv_mod_bn_conformer_11_conv_mod_bn_output_gamma:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_11_conv_mod_depthwise_conv/W:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'conformer_11_conv_mod_depthwise_conv/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_11_conv_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_11_conv_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_11_conv_mod_pointwise_conv_1/W:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'conformer_11_conv_mod_pointwise_conv_1/b:0' shape=(1024,) dtype=float32>, <tf.Variable 'conformer_11_conv_mod_pointwise_conv_2/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_11_conv_mod_pointwise_conv_2/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_11_ffmod_1_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_11_ffmod_1_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_11_ffmod_1_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_11_ffmod_1_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_11_ffmod_1_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_11_ffmod_1_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_11_ffmod_2_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_11_ffmod_2_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_11_ffmod_2_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_11_ffmod_2_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_11_ffmod_2_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_11_ffmod_2_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_11_mhsa_mod_att_linear/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_11_mhsa_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_11_mhsa_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_11_mhsa_mod_relpos_encoding/encoding_matrix:0' shape=(65, 64) dtype=float32>, <tf.Variable 'conformer_11_mhsa_mod_self_attention/QKV:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'conformer_11_output/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_11_output/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_12_conv_mod_bn/batch_norm/conformer_12_conv_mod_bn_conformer_12_conv_mod_bn_output_beta:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_12_conv_mod_bn/batch_norm/conformer_12_conv_mod_bn_conformer_12_conv_mod_bn_output_gamma:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_12_conv_mod_depthwise_conv/W:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'conformer_12_conv_mod_depthwise_conv/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_12_conv_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_12_conv_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_12_conv_mod_pointwise_conv_1/W:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'conformer_12_conv_mod_pointwise_conv_1/b:0' shape=(1024,) dtype=float32>, <tf.Variable 'conformer_12_conv_mod_pointwise_conv_2/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_12_conv_mod_pointwise_conv_2/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_12_ffmod_1_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_12_ffmod_1_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_12_ffmod_1_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_12_ffmod_1_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_12_ffmod_1_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_12_ffmod_1_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_12_ffmod_2_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_12_ffmod_2_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_12_ffmod_2_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_12_ffmod_2_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_12_ffmod_2_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_12_ffmod_2_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_12_mhsa_mod_att_linear/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_12_mhsa_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_12_mhsa_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_12_mhsa_mod_relpos_encoding/encoding_matrix:0' shape=(65, 64) dtype=float32>, <tf.Variable 'conformer_12_mhsa_mod_self_attention/QKV:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'conformer_12_output/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_12_output/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_bn/batch_norm/conformer_1_conv_mod_bn_conformer_1_conv_mod_bn_output_beta:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_bn/batch_norm/conformer_1_conv_mod_bn_conformer_1_conv_mod_bn_output_gamma:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_depthwise_conv/W:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_depthwise_conv/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_1/W:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_1/b:0' shape=(1024,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_2/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_2/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_att_linear/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_relpos_encoding/encoding_matrix:0' shape=(65, 64) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_self_attention/QKV:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'conformer_1_output/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_output/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_2_conv_mod_bn/batch_norm/conformer_2_conv_mod_bn_conformer_2_conv_mod_bn_output_beta:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_2_conv_mod_bn/batch_norm/conformer_2_conv_mod_bn_conformer_2_conv_mod_bn_output_gamma:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_2_conv_mod_depthwise_conv/W:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'conformer_2_conv_mod_depthwise_conv/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_2_conv_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_2_conv_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_2_conv_mod_pointwise_conv_1/W:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'conformer_2_conv_mod_pointwise_conv_1/b:0' shape=(1024,) dtype=float32>, <tf.Variable 'conformer_2_conv_mod_pointwise_conv_2/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_2_conv_mod_pointwise_conv_2/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_2_ffmod_1_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_2_ffmod_1_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_2_ffmod_1_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_2_ffmod_1_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_2_ffmod_1_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_2_ffmod_1_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_2_ffmod_2_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_2_ffmod_2_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_2_ffmod_2_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_2_ffmod_2_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_2_ffmod_2_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_2_ffmod_2_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_2_mhsa_mod_att_linear/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_2_mhsa_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_2_mhsa_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_2_mhsa_mod_relpos_encoding/encoding_matrix:0' shape=(65, 64) dtype=float32>, <tf.Variable 'conformer_2_mhsa_mod_self_attention/QKV:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'conformer_2_output/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_2_output/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_3_conv_mod_bn/batch_norm/conformer_3_conv_mod_bn_conformer_3_conv_mod_bn_output_beta:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_3_conv_mod_bn/batch_norm/conformer_3_conv_mod_bn_conformer_3_conv_mod_bn_output_gamma:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_3_conv_mod_depthwise_conv/W:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'conformer_3_conv_mod_depthwise_conv/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_3_conv_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_3_conv_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_3_conv_mod_pointwise_conv_1/W:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'conformer_3_conv_mod_pointwise_conv_1/b:0' shape=(1024,) dtype=float32>, <tf.Variable 'conformer_3_conv_mod_pointwise_conv_2/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_3_conv_mod_pointwise_conv_2/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_3_ffmod_1_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_3_ffmod_1_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_3_ffmod_1_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_3_ffmod_1_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_3_ffmod_1_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_3_ffmod_1_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_3_ffmod_2_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_3_ffmod_2_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_3_ffmod_2_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_3_ffmod_2_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_3_ffmod_2_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_3_ffmod_2_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_3_mhsa_mod_att_linear/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_3_mhsa_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_3_mhsa_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_3_mhsa_mod_relpos_encoding/encoding_matrix:0' shape=(65, 64) dtype=float32>, <tf.Variable 'conformer_3_mhsa_mod_self_attention/QKV:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'conformer_3_output/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_3_output/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_4_conv_mod_bn/batch_norm/conformer_4_conv_mod_bn_conformer_4_conv_mod_bn_output_beta:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_4_conv_mod_bn/batch_norm/conformer_4_conv_mod_bn_conformer_4_conv_mod_bn_output_gamma:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_4_conv_mod_depthwise_conv/W:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'conformer_4_conv_mod_depthwise_conv/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_4_conv_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_4_conv_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_4_conv_mod_pointwise_conv_1/W:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'conformer_4_conv_mod_pointwise_conv_1/b:0' shape=(1024,) dtype=float32>, <tf.Variable 'conformer_4_conv_mod_pointwise_conv_2/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_4_conv_mod_pointwise_conv_2/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_4_ffmod_1_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_4_ffmod_1_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_4_ffmod_1_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_4_ffmod_1_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_4_ffmod_1_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_4_ffmod_1_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_4_ffmod_2_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_4_ffmod_2_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_4_ffmod_2_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_4_ffmod_2_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_4_ffmod_2_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_4_ffmod_2_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_4_mhsa_mod_att_linear/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_4_mhsa_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_4_mhsa_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_4_mhsa_mod_relpos_encoding/encoding_matrix:0' shape=(65, 64) dtype=float32>, <tf.Variable 'conformer_4_mhsa_mod_self_attention/QKV:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'conformer_4_output/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_4_output/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_5_conv_mod_bn/batch_norm/conformer_5_conv_mod_bn_conformer_5_conv_mod_bn_output_beta:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_5_conv_mod_bn/batch_norm/conformer_5_conv_mod_bn_conformer_5_conv_mod_bn_output_gamma:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_5_conv_mod_depthwise_conv/W:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'conformer_5_conv_mod_depthwise_conv/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_5_conv_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_5_conv_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_5_conv_mod_pointwise_conv_1/W:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'conformer_5_conv_mod_pointwise_conv_1/b:0' shape=(1024,) dtype=float32>, <tf.Variable 'conformer_5_conv_mod_pointwise_conv_2/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_5_conv_mod_pointwise_conv_2/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_5_ffmod_1_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_5_ffmod_1_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_5_ffmod_1_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_5_ffmod_1_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_5_ffmod_1_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_5_ffmod_1_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_5_ffmod_2_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_5_ffmod_2_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_5_ffmod_2_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_5_ffmod_2_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_5_ffmod_2_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_5_ffmod_2_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_5_mhsa_mod_att_linear/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_5_mhsa_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_5_mhsa_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_5_mhsa_mod_relpos_encoding/encoding_matrix:0' shape=(65, 64) dtype=float32>, <tf.Variable 'conformer_5_mhsa_mod_self_attention/QKV:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'conformer_5_output/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_5_output/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_6_conv_mod_bn/batch_norm/conformer_6_conv_mod_bn_conformer_6_conv_mod_bn_output_beta:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_6_conv_mod_bn/batch_norm/conformer_6_conv_mod_bn_conformer_6_conv_mod_bn_output_gamma:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_6_conv_mod_depthwise_conv/W:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'conformer_6_conv_mod_depthwise_conv/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_6_conv_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_6_conv_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_6_conv_mod_pointwise_conv_1/W:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'conformer_6_conv_mod_pointwise_conv_1/b:0' shape=(1024,) dtype=float32>, <tf.Variable 'conformer_6_conv_mod_pointwise_conv_2/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_6_conv_mod_pointwise_conv_2/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_6_ffmod_1_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_6_ffmod_1_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_6_ffmod_1_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_6_ffmod_1_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_6_ffmod_1_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_6_ffmod_1_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_6_ffmod_2_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_6_ffmod_2_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_6_ffmod_2_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_6_ffmod_2_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_6_ffmod_2_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_6_ffmod_2_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_6_mhsa_mod_att_linear/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_6_mhsa_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_6_mhsa_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_6_mhsa_mod_relpos_encoding/encoding_matrix:0' shape=(65, 64) dtype=float32>, <tf.Variable 'conformer_6_mhsa_mod_self_attention/QKV:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'conformer_6_output/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_6_output/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_7_conv_mod_bn/batch_norm/conformer_7_conv_mod_bn_conformer_7_conv_mod_bn_output_beta:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_7_conv_mod_bn/batch_norm/conformer_7_conv_mod_bn_conformer_7_conv_mod_bn_output_gamma:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_7_conv_mod_depthwise_conv/W:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'conformer_7_conv_mod_depthwise_conv/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_7_conv_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_7_conv_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_7_conv_mod_pointwise_conv_1/W:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'conformer_7_conv_mod_pointwise_conv_1/b:0' shape=(1024,) dtype=float32>, <tf.Variable 'conformer_7_conv_mod_pointwise_conv_2/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_7_conv_mod_pointwise_conv_2/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_7_ffmod_1_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_7_ffmod_1_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_7_ffmod_1_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_7_ffmod_1_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_7_ffmod_1_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_7_ffmod_1_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_7_ffmod_2_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_7_ffmod_2_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_7_ffmod_2_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_7_ffmod_2_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_7_ffmod_2_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_7_ffmod_2_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_7_mhsa_mod_att_linear/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_7_mhsa_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_7_mhsa_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_7_mhsa_mod_relpos_encoding/encoding_matrix:0' shape=(65, 64) dtype=float32>, <tf.Variable 'conformer_7_mhsa_mod_self_attention/QKV:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'conformer_7_output/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_7_output/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_8_conv_mod_bn/batch_norm/conformer_8_conv_mod_bn_conformer_8_conv_mod_bn_output_beta:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_8_conv_mod_bn/batch_norm/conformer_8_conv_mod_bn_conformer_8_conv_mod_bn_output_gamma:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_8_conv_mod_depthwise_conv/W:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'conformer_8_conv_mod_depthwise_conv/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_8_conv_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_8_conv_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_8_conv_mod_pointwise_conv_1/W:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'conformer_8_conv_mod_pointwise_conv_1/b:0' shape=(1024,) dtype=float32>, <tf.Variable 'conformer_8_conv_mod_pointwise_conv_2/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_8_conv_mod_pointwise_conv_2/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_8_ffmod_1_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_8_ffmod_1_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_8_ffmod_1_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_8_ffmod_1_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_8_ffmod_1_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_8_ffmod_1_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_8_ffmod_2_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_8_ffmod_2_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_8_ffmod_2_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_8_ffmod_2_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_8_ffmod_2_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_8_ffmod_2_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_8_mhsa_mod_att_linear/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_8_mhsa_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_8_mhsa_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_8_mhsa_mod_relpos_encoding/encoding_matrix:0' shape=(65, 64) dtype=float32>, <tf.Variable 'conformer_8_mhsa_mod_self_attention/QKV:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'conformer_8_output/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_8_output/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_9_conv_mod_bn/batch_norm/conformer_9_conv_mod_bn_conformer_9_conv_mod_bn_output_beta:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_9_conv_mod_bn/batch_norm/conformer_9_conv_mod_bn_conformer_9_conv_mod_bn_output_gamma:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_9_conv_mod_depthwise_conv/W:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'conformer_9_conv_mod_depthwise_conv/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_9_conv_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_9_conv_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_9_conv_mod_pointwise_conv_1/W:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'conformer_9_conv_mod_pointwise_conv_1/b:0' shape=(1024,) dtype=float32>, <tf.Variable 'conformer_9_conv_mod_pointwise_conv_2/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_9_conv_mod_pointwise_conv_2/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_9_ffmod_1_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_9_ffmod_1_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_9_ffmod_1_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_9_ffmod_1_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_9_ffmod_1_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_9_ffmod_1_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_9_ffmod_2_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_9_ffmod_2_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_9_ffmod_2_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_9_ffmod_2_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_9_ffmod_2_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_9_ffmod_2_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_9_mhsa_mod_att_linear/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_9_mhsa_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_9_mhsa_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_9_mhsa_mod_relpos_encoding/encoding_matrix:0' shape=(65, 64) dtype=float32>, <tf.Variable 'conformer_9_mhsa_mod_self_attention/QKV:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'conformer_9_output/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_9_output/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conv_1/W:0' shape=(3, 3, 1, 32) dtype=float32>, <tf.Variable 'conv_1/bias:0' shape=(32,) dtype=float32>, <tf.Variable 'conv_2/W:0' shape=(3, 3, 32, 64) dtype=float32>, <tf.Variable 'conv_2/bias:0' shape=(64,) dtype=float32>, <tf.Variable 'conv_3/W:0' shape=(3, 3, 64, 64) dtype=float32>, <tf.Variable 'conv_3/bias:0' shape=(64,) dtype=float32>, <tf.Variable 'features/output/batch_norm/output_output_output_beta:0' shape=(1, 1, 80) dtype=float32>, <tf.Variable 'features/output/batch_norm/output_output_output_gamma:0' shape=(1, 1, 80) dtype=float32>, <tf.Variable 'features/wave_norm/bias:0' shape=(1,) dtype=float32>, <tf.Variable 'features/wave_norm/scale:0' shape=(1,) dtype=float32>, <tf.Variable 'input_linear/W:0' shape=(2560, 512) dtype=float32>, <tf.Variable 'output/W:0' shape=(512, 88) dtype=float32>, <tf.Variable 'output/b:0' shape=(88,) dtype=float32>] | |
loading weights from /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.390 | |
[2023-10-11 21:39:40,225] INFO: [32mRun time: 0:00:45 CPU: 0.40% RSS: 2.05GB VMS: 24.15GB[0m | |
[2023-10-11 21:39:45,251] INFO: [32mRun time: 0:00:50 CPU: 0.20% RSS: 2.35GB VMS: 24.41GB[0m | |
start training at epoch 391 | |
using batch size: {'classes': 5000, 'data': 400000}, max seqs: 128 | |
learning rate control: NewbobMultiEpoch(num_epochs=6, update_interval=1, relative_error_threshold=-0.01, relative_error_grow_threshold=-0.01), epoch data: 1: EpochData(learningRate=1.325e-05, error={ | |
'dev_score': 1.3712836585087085, | |
'devtrain_score': 1.3461227879494617, | |
'train_score': 1.3859369810006912, | |
}), 2: EpochData(learningRate=1.539861111111111e-05, error={ | |
'dev_score': 1.2479231324111937, | |
'devtrain_score': 1.217953189493425, | |
'train_score': 1.2705600519813665, | |
}), 3: EpochData(learningRate=1.754722222222222e-05, error={ | |
'dev_score': 1.191105520505297, | |
'devtrain_score': 1.152482158154539, | |
'train_score': 1.2071181430771236, | |
}), ..., 388: EpochData(learningRate=1e-05, error={ | |
'dev_score': 0.08985679090525922, | |
'devtrain_score': 0.026802025918303496, | |
'train_score': 0.08481302595337009, | |
}), 389: EpochData(learningRate=1e-05, error={ | |
'dev_score': 0.09035191899876244, | |
'devtrain_score': 0.026659155496610833, | |
'train_score': 0.08407203565283547, | |
}), 390: EpochData(learningRate=1e-05, error={ | |
'dev_score': 0.09086543904481646, | |
'devtrain_score': 0.026506946784649415, | |
'train_score': 0.0845128287906251, | |
}), error key: dev_score | |
pretrain: None | |
[2023-10-11 21:39:50,273] INFO: [32mRun time: 0:00:55 CPU: 0.60% RSS: 2.91GB VMS: 24.96GB[0m | |
start epoch 391 with learning rate 1e-05 ... | |
Create optimizer <class 'returnn.tf.updater.NadamOptimizer'> with options {'epsilon': 1e-08, 'learning_rate': <tf.Variable 'learning_rate:0' shape=() dtype=float32>}. | |
Initialize optimizer (default) with slots ['m', 'v']. | |
These additional variable were created by the optimizer: [<tf.Variable 'optimize/beta1_power:0' shape=() dtype=float32>, <tf.Variable 'optimize/beta2_power:0' shape=() dtype=float32>]. | |
[2023-10-11 21:40:20,386] INFO: [32mRun time: 0:01:25 CPU: 0.20% RSS: 3.34GB VMS: 25.45GB[0m | |
/work/asr4/vieting/programs/rasr/20230707/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard: Relink `/usr/local/lib/tensorflow/libtensorflow_framework.so.2' with `/lib/x86_64-linux-gnu/libz.so.1' for IFUNC symbol `crc32_z' | |
[2023-10-11 21:40:50,498] INFO: [32mRun time: 0:01:55 CPU: 0.40% RSS: 3.77GB VMS: 26.64GB[0m | |
configuration error: failed to open file "neural-network-trainer.config" for reading. (No such file or directory) | |
/work/asr4/vieting/programs/rasr/20230707/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard: Relink `/usr/local/lib/tensorflow/libtensorflow_framework.so.2' with `/lib/x86_64-linux-gnu/libz.so.1' for IFUNC symbol `crc32_z' | |
configuration error: failed to open file "neural-network-trainer.config" for reading. (No such file or directory) | |
2023-10-11 21:40:59.767405: I tensorflow/stream_executor/cuda/cuda_dnn.cc:379] Loaded cuDNN version 8400 | |
[2023-10-11 21:41:00,546] INFO: [32mRun time: 0:02:05 CPU: 0.60% RSS: 4.51GB VMS: 29.10GB[0m | |
2023-10-11 21:41:03.973206: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory | |
[2023-10-11 21:41:10,596] INFO: [32mRun time: 0:02:15 CPU: 0.60% RSS: 5.12GB VMS: 29.71GB[0m | |
[2023-10-11 21:44:06,421] INFO: [32mRun time: 0:05:11 CPU: 0.40% RSS: 5.64GB VMS: 30.46GB[0m | |
[2023-10-11 21:48:42,650] INFO: [32mRun time: 0:09:47 CPU: 0.40% RSS: 6.20GB VMS: 31.22GB[0m | |
[2023-10-11 22:02:00,986] INFO: [32mRun time: 0:23:05 CPU: 0.40% RSS: 6.82GB VMS: 31.99GB[0m | |
Stats: | |
mem_usage:GPU:0: Stats(mean=4.3GB, std_dev=166.6MB, min=388.8MB, max=4.4GB, num_seqs=4178, avg_data_len=1) | |
train epoch 391, finished after 4178 steps, 0:22:30 elapsed (97.4% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.391 | |
epoch 391 score: 0.08400366136566557 error: None elapsed: 0:22:30 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=4.4GB, std_dev=0.0B, min=4.4GB, max=4.4GB, num_seqs=31, avg_data_len=1) | |
epoch 391 'dev' eval, finished after 31 steps, 0:00:08 elapsed (81.0% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=4.4GB, std_dev=0.0B, min=4.4GB, max=4.4GB, num_seqs=29, avg_data_len=1) | |
epoch 391 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (54.5% computing time) | |
dev: score 0.08993440706559858 error None devtrain: score 0.026880376000358423 error None | |
15 epochs stored so far and keeping all. | |
start epoch 392 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=4.6GB, std_dev=57.8MB, min=4.4GB, max=4.6GB, num_seqs=4180, avg_data_len=1) | |
train epoch 392, finished after 4180 steps, 0:19:16 elapsed (99.4% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.392 | |
epoch 392 score: 0.08421701323548142 error: None elapsed: 0:19:16 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=4.6GB, std_dev=0.0B, min=4.6GB, max=4.6GB, num_seqs=31, avg_data_len=1) | |
epoch 392 'dev' eval, finished after 31 steps, 0:00:04 elapsed (60.0% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=4.6GB, std_dev=0.0B, min=4.6GB, max=4.6GB, num_seqs=29, avg_data_len=1) | |
epoch 392 'devtrain' eval, finished after 29 steps, 0:00:03 elapsed (55.4% computing time) | |
dev: score 0.09141929587738695 error None devtrain: score 0.02739781722813435 error None | |
16 epochs stored so far and keeping all. | |
start epoch 393 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=4.7GB, std_dev=10.7MB, min=4.6GB, max=4.7GB, num_seqs=4182, avg_data_len=1) | |
train epoch 393, finished after 4182 steps, 0:19:11 elapsed (99.4% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.393 | |
epoch 393 score: 0.08388911349128969 error: None elapsed: 0:19:11 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=4.7GB, std_dev=0.0B, min=4.7GB, max=4.7GB, num_seqs=31, avg_data_len=1) | |
epoch 393 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.0% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=4.7GB, std_dev=0.0B, min=4.7GB, max=4.7GB, num_seqs=29, avg_data_len=1) | |
epoch 393 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (56.2% computing time) | |
dev: score 0.09072998857164565 error None devtrain: score 0.026826840835708805 error None | |
We have stored models for epochs [359, 364, 366, ..., 391, 392, 393] and keep epochs [359, 364, 366, 370, 377, 378, 379, 383, 386, 387, 389, 390, 391, 392, 393]. | |
We will delete the models of epochs [373, 388]. | |
Deleted 587.4MB. | |
start epoch 394 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.2GB, std_dev=373.1MB, min=4.7GB, max=5.5GB, num_seqs=4168, avg_data_len=1) | |
train epoch 394, finished after 4168 steps, 0:19:07 elapsed (99.4% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.394 | |
epoch 394 score: 0.08400428946445164 error: None elapsed: 0:19:07 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 394 'dev' eval, finished after 31 steps, 0:00:04 elapsed (59.2% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 394 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.0% computing time) | |
dev: score 0.09077804731220876 error None devtrain: score 0.025894683877561105 error None | |
We have stored models for epochs [359, 364, 366, ..., 392, 393, 394] and keep epochs [359, 364, 366, 370, 377, 378, 379, 383, 386, 387, 390, 391, 392, 393, 394]. | |
We will delete the models of epochs [389]. | |
Deleted 293.7MB. | |
start epoch 395 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4177, avg_data_len=1) | |
train epoch 395, finished after 4177 steps, 0:19:12 elapsed (99.4% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.395 | |
[2023-10-11 23:20:25,640] INFO: [32mRun time: 1:41:30 CPU: 0.40% RSS: 7.56GB VMS: 32.86GB[0m | |
epoch 395 score: 0.08496213777813798 error: None elapsed: 0:19:12 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 395 'dev' eval, finished after 31 steps, 0:00:04 elapsed (57.8% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 395 'devtrain' eval, finished after 29 steps, 0:00:03 elapsed (56.1% computing time) | |
dev: score 0.09076227162597035 error None devtrain: score 0.025947693735172528 error None | |
16 epochs stored so far and keeping all. | |
start epoch 396 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4234, avg_data_len=1) | |
train epoch 396, finished after 4234 steps, 0:19:27 elapsed (99.4% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.396 | |
epoch 396 score: 0.08434961924249695 error: None elapsed: 0:19:27 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 396 'dev' eval, finished after 31 steps, 0:00:04 elapsed (57.5% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 396 'devtrain' eval, finished after 29 steps, 0:00:03 elapsed (56.1% computing time) | |
dev: score 0.0904330411207085 error None devtrain: score 0.025514320532723372 error None | |
We have stored models for epochs [359, 364, 366, ..., 394, 395, 396] and keep epochs [359, 364, 366, 370, 377, 378, 379, 383, 386, 387, 391, 392, 393, 394, 395, 396]. | |
We will delete the models of epochs [390]. | |
Deleted 293.7MB. | |
start epoch 397 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4181, avg_data_len=1) | |
train epoch 397, finished after 4181 steps, 0:19:08 elapsed (99.4% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.397 | |
epoch 397 score: 0.08391337878236499 error: None elapsed: 0:19:08 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 397 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.4% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 397 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.9% computing time) | |
dev: score 0.09013096193402975 error None devtrain: score 0.026373680437931465 error None | |
We have stored models for epochs [359, 364, 366, ..., 395, 396, 397] and keep epochs [359, 364, 366, 370, 377, 378, 379, 386, 387, 393, 394, 395, 396, 397]. | |
We will delete the models of epochs [383, 391, 392]. | |
Deleted 0.9GB. | |
start epoch 398 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4176, avg_data_len=1) | |
train epoch 398, finished after 4176 steps, 0:19:12 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.398 | |
epoch 398 score: 0.08265076995881551 error: None elapsed: 0:19:12 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 398 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.9% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 398 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (56.0% computing time) | |
dev: score 0.09101703816939097 error None devtrain: score 0.02660280731553671 error None | |
15 epochs stored so far and keeping all. | |
start epoch 399 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4193, avg_data_len=1) | |
train epoch 399, finished after 4193 steps, 0:19:43 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.399 | |
epoch 399 score: 0.08235212145190463 error: None elapsed: 0:19:43 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 399 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.5% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 399 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (56.4% computing time) | |
dev: score 0.09018240329777154 error None devtrain: score 0.026346119970342592 error None | |
We have stored models for epochs [359, 364, 366, ..., 397, 398, 399] and keep epochs [359, 364, 366, 370, 378, 379, 386, 387, 394, 395, 396, 397, 398, 399]. | |
We will delete the models of epochs [377, 393]. | |
Deleted 587.4MB. | |
start epoch 400 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=261.6KB, min=5.5GB, max=5.5GB, num_seqs=4160, avg_data_len=1) | |
train epoch 400, finished after 4160 steps, 0:19:32 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.400 | |
epoch 400 score: 0.08339132824813998 error: None elapsed: 0:19:32 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 400 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.4% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 400 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.3% computing time) | |
dev: score 0.09022391463791925 error None devtrain: score 0.025841081156994548 error None | |
We have stored models for epochs [359, 364, 366, ..., 398, 399, 400] and keep epochs [359, 364, 366, 370, 378, 386, 387, 394, 395, 396, 397, 398, 399, 400]. | |
We will delete the models of epochs [379]. | |
Deleted 293.7MB. | |
start epoch 401 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4191, avg_data_len=1) | |
train epoch 401, finished after 4191 steps, 0:19:42 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.401 | |
epoch 401 score: 0.08321606418107716 error: None elapsed: 0:19:42 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 401 'dev' eval, finished after 31 steps, 0:00:04 elapsed (59.0% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 401 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (56.2% computing time) | |
dev: score 0.09055862691293622 error None devtrain: score 0.02612697981518672 error None | |
We have stored models for epochs [359, 364, 366, ..., 399, 400, 401] and keep epochs [359, 364, 366, 370, 378, 387, 394, 395, 396, 397, 398, 399, 400, 401]. | |
We will delete the models of epochs [386]. | |
Deleted 293.7MB. | |
start epoch 402 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4219, avg_data_len=1) | |
train epoch 402, finished after 4219 steps, 0:19:50 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.402 | |
epoch 402 score: 0.08450456713209396 error: None elapsed: 0:19:50 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 402 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.9% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 402 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.2% computing time) | |
dev: score 0.09092887729701023 error None devtrain: score 0.025816170981262557 error None | |
We have stored models for epochs [359, 364, 366, ..., 400, 401, 402] and keep epochs [359, 364, 366, 370, 378, 387, 394, 395, 396, 398, 399, 400, 401, 402]. | |
We will delete the models of epochs [397]. | |
Deleted 293.7MB. | |
start epoch 403 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4210, avg_data_len=1) | |
train epoch 403, finished after 4210 steps, 0:19:47 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.403 | |
epoch 403 score: 0.08293471938524258 error: None elapsed: 0:19:47 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 403 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.1% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 403 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.6% computing time) | |
dev: score 0.0901411941256409 error None devtrain: score 0.026532771124956972 error None | |
We have stored models for epochs [359, 364, 366, ..., 401, 402, 403] and keep epochs [359, 364, 366, 370, 378, 394, 395, 396, 398, 399, 400, 401, 402, 403]. | |
We will delete the models of epochs [387]. | |
Deleted 293.7MB. | |
start epoch 404 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4196, avg_data_len=1) | |
train epoch 404, finished after 4196 steps, 0:19:43 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.404 | |
epoch 404 score: 0.08249238263906251 error: None elapsed: 0:19:43 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 404 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.5% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 404 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (54.2% computing time) | |
dev: score 0.0899801640639724 error None devtrain: score 0.02644721011386587 error None | |
15 epochs stored so far and keeping all. | |
start epoch 405 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4186, avg_data_len=1) | |
train epoch 405, finished after 4186 steps, 0:19:40 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.405 | |
epoch 405 score: 0.08245617300247964 error: None elapsed: 0:19:40 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 405 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.9% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 405 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.6% computing time) | |
dev: score 0.09077911388905734 error None devtrain: score 0.026696995568451804 error None | |
16 epochs stored so far and keeping all. | |
start epoch 406 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4167, avg_data_len=1) | |
train epoch 406, finished after 4167 steps, 0:19:31 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.406 | |
epoch 406 score: 0.08399680986622006 error: None elapsed: 0:19:31 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 406 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.7% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 406 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (54.8% computing time) | |
dev: score 0.08966737057491372 error None devtrain: score 0.026257957092432414 error None | |
We have stored models for epochs [359, 364, 366, ..., 404, 405, 406] and keep epochs [359, 364, 366, 378, 394, 395, 396, 398, 399, 400, 402, 403, 404, 405, 406]. | |
We will delete the models of epochs [370, 401]. | |
Deleted 587.4MB. | |
start epoch 407 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4159, avg_data_len=1) | |
train epoch 407, finished after 4159 steps, 0:19:33 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.407 | |
epoch 407 score: 0.08353985838337341 error: None elapsed: 0:19:33 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 407 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.8% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 407 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.6% computing time) | |
dev: score 0.09027761015614455 error None devtrain: score 0.02561776016492116 error None | |
We have stored models for epochs [359, 364, 366, ..., 405, 406, 407] and keep epochs [359, 364, 366, 378, 394, 396, 398, 399, 400, 402, 403, 404, 405, 406, 407]. | |
We will delete the models of epochs [395]. | |
Deleted 293.7MB. | |
start epoch 408 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4200, avg_data_len=1) | |
train epoch 408, finished after 4200 steps, 0:19:46 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.408 | |
epoch 408 score: 0.08314991566424393 error: None elapsed: 0:19:46 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 408 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.6% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 408 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (54.8% computing time) | |
dev: score 0.09049524025583396 error None devtrain: score 0.025993217321075256 error None | |
16 epochs stored so far and keeping all. | |
start epoch 409 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4205, avg_data_len=1) | |
train epoch 409, finished after 4205 steps, 0:19:45 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.409 | |
epoch 409 score: 0.08251877963602931 error: None elapsed: 0:19:45 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 409 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.5% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 409 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.1% computing time) | |
dev: score 0.09067290124664933 error None devtrain: score 0.026320297886880933 error None | |
We have stored models for epochs [359, 364, 366, ..., 407, 408, 409] and keep epochs [359, 364, 366, 378, 394, 396, 398, 399, 400, 402, 404, 405, 406, 407, 408, 409]. | |
We will delete the models of epochs [403]. | |
Deleted 293.7MB. | |
start epoch 410 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4197, avg_data_len=1) | |
train epoch 410, finished after 4197 steps, 0:19:43 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.410 | |
[2023-10-12 04:18:38,364] INFO: [32mRun time: 6:39:43 CPU: 0.60% RSS: 8.37GB VMS: 33.83GB[0m | |
epoch 410 score: 0.08337167311160373 error: None elapsed: 0:19:43 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 410 'dev' eval, finished after 31 steps, 0:00:04 elapsed (57.8% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 410 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.3% computing time) | |
dev: score 0.09183007896461402 error None devtrain: score 0.02651194302297897 error None | |
17 epochs stored so far and keeping all. | |
start epoch 411 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4171, avg_data_len=1) | |
train epoch 411, finished after 4171 steps, 0:19:33 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.411 | |
epoch 411 score: 0.08249471195955657 error: None elapsed: 0:19:33 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 411 'dev' eval, finished after 31 steps, 0:00:04 elapsed (57.8% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 411 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.4% computing time) | |
dev: score 0.09186992857547137 error None devtrain: score 0.02616978679383357 error None | |
We have stored models for epochs [359, 364, 366, ..., 409, 410, 411] and keep epochs [359, 364, 366, 378, 394, 396, 399, 400, 402, 404, 405, 406, 407, 408, 409, 410, 411]. | |
We will delete the models of epochs [398]. | |
Deleted 293.7MB. | |
start epoch 412 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4172, avg_data_len=1) | |
train epoch 412, finished after 4172 steps, 0:19:36 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.412 | |
epoch 412 score: 0.08319460719168133 error: None elapsed: 0:19:36 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 412 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.5% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 412 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.6% computing time) | |
dev: score 0.09019979357085962 error None devtrain: score 0.0257979828135639 error None | |
We have stored models for epochs [359, 364, 366, ..., 410, 411, 412] and keep epochs [359, 364, 366, 378, 396, 399, 400, 402, 404, 405, 406, 407, 408, 409, 410, 411, 412]. | |
We will delete the models of epochs [394]. | |
Deleted 293.7MB. | |
start epoch 413 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4197, avg_data_len=1) | |
train epoch 413, finished after 4197 steps, 0:19:43 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.413 | |
epoch 413 score: 0.08306754462751167 error: None elapsed: 0:19:43 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 413 'dev' eval, finished after 31 steps, 0:00:04 elapsed (59.3% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 413 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.4% computing time) | |
dev: score 0.09074148754241838 error None devtrain: score 0.025274968671020553 error None | |
We have stored models for epochs [359, 364, 366, ..., 411, 412, 413] and keep epochs [359, 364, 366, 378, 396, 399, 402, 404, 405, 406, 407, 409, 410, 411, 412, 413]. | |
We will delete the models of epochs [400, 408]. | |
Deleted 587.4MB. | |
start epoch 414 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4172, avg_data_len=1) | |
train epoch 414, finished after 4172 steps, 0:19:35 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.414 | |
epoch 414 score: 0.0836884931439581 error: None elapsed: 0:19:35 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 414 'dev' eval, finished after 31 steps, 0:00:04 elapsed (59.0% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 414 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (54.8% computing time) | |
dev: score 0.09173472828561914 error None devtrain: score 0.02538882494733823 error None | |
We have stored models for epochs [359, 364, 366, ..., 412, 413, 414] and keep epochs [359, 364, 366, 378, 396, 399, 404, 405, 406, 407, 409, 410, 411, 412, 413, 414]. | |
We will delete the models of epochs [402]. | |
Deleted 293.7MB. | |
start epoch 415 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4183, avg_data_len=1) | |
train epoch 415, finished after 4183 steps, 0:19:40 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.415 | |
epoch 415 score: 0.08181178922788339 error: None elapsed: 0:19:40 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 415 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.6% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 415 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.1% computing time) | |
dev: score 0.09110214819673813 error None devtrain: score 0.026366395757861632 error None | |
We have stored models for epochs [359, 364, 366, ..., 413, 414, 415] and keep epochs [359, 364, 366, 378, 396, 399, 404, 405, 406, 407, 411, 412, 413, 414, 415]. | |
We will delete the models of epochs [409, 410]. | |
Deleted 587.4MB. | |
start epoch 416 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4174, avg_data_len=1) | |
train epoch 416, finished after 4174 steps, 0:19:36 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.416 | |
epoch 416 score: 0.0827154119004278 error: None elapsed: 0:19:36 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 416 'dev' eval, finished after 31 steps, 0:00:04 elapsed (57.1% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 416 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (56.3% computing time) | |
dev: score 0.09127620120458463 error None devtrain: score 0.025958318614427975 error None | |
16 epochs stored so far and keeping all. | |
start epoch 417 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4194, avg_data_len=1) | |
train epoch 417, finished after 4194 steps, 0:19:42 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.417 | |
epoch 417 score: 0.08388887599570766 error: None elapsed: 0:19:42 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 417 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.5% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 417 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.0% computing time) | |
dev: score 0.09071111033396588 error None devtrain: score 0.025879979203816565 error None | |
17 epochs stored so far and keeping all. | |
start epoch 418 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4173, avg_data_len=1) | |
train epoch 418, finished after 4173 steps, 0:19:34 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.418 | |
epoch 418 score: 0.08209063794468534 error: None elapsed: 0:19:34 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 418 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.2% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 418 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.3% computing time) | |
dev: score 0.09147636589270317 error None devtrain: score 0.025095803548978883 error None | |
We have stored models for epochs [359, 364, 366, ..., 416, 417, 418] and keep epochs [359, 364, 366, 378, 396, 399, 404, 405, 406, 407, 413, 414, 415, 416, 417, 418]. | |
We will delete the models of epochs [411, 412]. | |
Deleted 587.4MB. | |
start epoch 419 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4194, avg_data_len=1) | |
train epoch 419, finished after 4194 steps, 0:19:42 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.419 | |
epoch 419 score: 0.0828518131020664 error: None elapsed: 0:19:42 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 419 'dev' eval, finished after 31 steps, 0:00:04 elapsed (57.8% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 419 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.8% computing time) | |
dev: score 0.09164404081218212 error None devtrain: score 0.02526741039696012 error None | |
We have stored models for epochs [359, 364, 366, ..., 417, 418, 419] and keep epochs [359, 364, 366, 378, 396, 399, 404, 405, 406, 413, 414, 415, 416, 417, 418, 419]. | |
We will delete the models of epochs [407]. | |
Deleted 293.7MB. | |
start epoch 420 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4193, avg_data_len=1) | |
train epoch 420, finished after 4193 steps, 0:19:42 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.420 | |
epoch 420 score: 0.08323626809524225 error: None elapsed: 0:19:42 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 420 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.5% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 420 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.3% computing time) | |
dev: score 0.09096238227003942 error None devtrain: score 0.0251657299686666 error None | |
We have stored models for epochs [359, 364, 366, ..., 418, 419, 420] and keep epochs [359, 364, 366, 378, 399, 404, 405, 406, 413, 414, 415, 416, 417, 418, 419, 420]. | |
We will delete the models of epochs [396]. | |
Deleted 293.7MB. | |
start epoch 421 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4208, avg_data_len=1) | |
train epoch 421, finished after 4208 steps, 0:19:45 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.421 | |
epoch 421 score: 0.08222371817884308 error: None elapsed: 0:19:45 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 421 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.3% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 421 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.4% computing time) | |
dev: score 0.09096776702766454 error None devtrain: score 0.02527897756168886 error None | |
We have stored models for epochs [359, 364, 366, ..., 419, 420, 421] and keep epochs [359, 364, 366, 378, 399, 405, 406, 413, 415, 417, 418, 419, 420, 421]. | |
We will delete the models of epochs [404, 414, 416]. | |
Deleted 0.9GB. | |
start epoch 422 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4144, avg_data_len=1) | |
train epoch 422, finished after 4144 steps, 0:19:25 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.422 | |
epoch 422 score: 0.08255526070388984 error: None elapsed: 0:19:25 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 422 'dev' eval, finished after 31 steps, 0:00:04 elapsed (59.8% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 422 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.5% computing time) | |
dev: score 0.09177574777482436 error None devtrain: score 0.025661581341717046 error None | |
We have stored models for epochs [359, 364, 366, ..., 420, 421, 422] and keep epochs [359, 364, 366, 378, 399, 405, 406, 413, 415, 418, 419, 420, 421, 422]. | |
We will delete the models of epochs [417]. | |
Deleted 293.7MB. | |
start epoch 423 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4173, avg_data_len=1) | |
train epoch 423, finished after 4173 steps, 0:19:34 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.423 | |
epoch 423 score: 0.08306380528087216 error: None elapsed: 0:19:34 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 423 'dev' eval, finished after 31 steps, 0:00:04 elapsed (59.1% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 423 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (56.0% computing time) | |
dev: score 0.09055247913793295 error None devtrain: score 0.02531839647658727 error None | |
15 epochs stored so far and keeping all. | |
start epoch 424 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4222, avg_data_len=1) | |
train epoch 424, finished after 4222 steps, 0:19:49 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.424 | |
epoch 424 score: 0.08247824282325938 error: None elapsed: 0:19:49 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 424 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.5% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 424 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.3% computing time) | |
dev: score 0.09065868220714168 error None devtrain: score 0.02492753470312136 error None | |
16 epochs stored so far and keeping all. | |
start epoch 425 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4195, avg_data_len=1) | |
train epoch 425, finished after 4195 steps, 0:19:42 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.425 | |
epoch 425 score: 0.08196086737207589 error: None elapsed: 0:19:42 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 425 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.6% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 425 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.0% computing time) | |
dev: score 0.09140785595989587 error None devtrain: score 0.02425309385319315 error None | |
We have stored models for epochs [359, 364, 366, ..., 423, 424, 425] and keep epochs [359, 364, 366, 378, 399, 406, 415, 418, 419, 420, 421, 422, 423, 424, 425]. | |
We will delete the models of epochs [405, 413]. | |
Deleted 587.4MB. | |
start epoch 426 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4183, avg_data_len=1) | |
train epoch 426, finished after 4183 steps, 0:19:38 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.426 | |
epoch 426 score: 0.08286235069177315 error: None elapsed: 0:19:38 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 426 'dev' eval, finished after 31 steps, 0:00:04 elapsed (59.5% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 426 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.2% computing time) | |
dev: score 0.09141815789115065 error None devtrain: score 0.024283266888936613 error None | |
We have stored models for epochs [359, 364, 366, ..., 424, 425, 426] and keep epochs [359, 364, 366, 378, 399, 406, 415, 418, 420, 421, 422, 423, 424, 425, 426]. | |
We will delete the models of epochs [419]. | |
Deleted 293.7MB. | |
start epoch 427 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4157, avg_data_len=1) | |
train epoch 427, finished after 4157 steps, 0:19:30 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.427 | |
epoch 427 score: 0.08310631056093992 error: None elapsed: 0:19:30 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 427 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.8% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 427 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.4% computing time) | |
dev: score 0.09138641472073929 error None devtrain: score 0.025806944188148745 error None | |
We have stored models for epochs [359, 364, 366, ..., 425, 426, 427] and keep epochs [359, 364, 366, 378, 399, 406, 415, 418, 420, 421, 423, 424, 425, 426, 427]. | |
We will delete the models of epochs [422]. | |
Deleted 293.7MB. | |
start epoch 428 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4207, avg_data_len=1) | |
train epoch 428, finished after 4207 steps, 0:19:46 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.428 | |
epoch 428 score: 0.08160235774561941 error: None elapsed: 0:19:46 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 428 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.3% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 428 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.4% computing time) | |
dev: score 0.09227596118518427 error None devtrain: score 0.025816427641435925 error None | |
We have stored models for epochs [359, 364, 366, ..., 426, 427, 428] and keep epochs [359, 364, 366, 378, 406, 415, 418, 420, 421, 424, 425, 426, 427, 428]. | |
We will delete the models of epochs [399, 423]. | |
Deleted 587.4MB. | |
start epoch 429 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4188, avg_data_len=1) | |
train epoch 429, finished after 4188 steps, 0:19:41 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.429 | |
epoch 429 score: 0.08170135714184865 error: None elapsed: 0:19:41 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 429 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.0% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 429 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (54.3% computing time) | |
dev: score 0.0912771197864512 error None devtrain: score 0.02508745336861734 error None | |
We have stored models for epochs [359, 364, 366, ..., 427, 428, 429] and keep epochs [359, 364, 366, 378, 406, 415, 418, 424, 425, 426, 427, 428, 429]. | |
We will delete the models of epochs [420, 421]. | |
Deleted 587.4MB. | |
start epoch 430 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4195, avg_data_len=1) | |
train epoch 430, finished after 4195 steps, 0:19:42 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.430 | |
epoch 430 score: 0.08222845890154948 error: None elapsed: 0:19:42 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 430 'dev' eval, finished after 31 steps, 0:00:04 elapsed (59.5% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 430 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (56.2% computing time) | |
dev: score 0.09160382502645201 error None devtrain: score 0.024624127270048743 error None | |
14 epochs stored so far and keeping all. | |
start epoch 431 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4189, avg_data_len=1) | |
train epoch 431, finished after 4189 steps, 0:19:41 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.431 | |
epoch 431 score: 0.08291845272060938 error: None elapsed: 0:19:41 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 431 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.2% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 431 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (56.1% computing time) | |
dev: score 0.08979805604379705 error None devtrain: score 0.024595885205106763 error None | |
15 epochs stored so far and keeping all. | |
start epoch 432 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4179, avg_data_len=1) | |
train epoch 432, finished after 4179 steps, 0:19:36 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.432 | |
epoch 432 score: 0.08164206453017459 error: None elapsed: 0:19:36 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 432 'dev' eval, finished after 31 steps, 0:00:04 elapsed (57.7% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 432 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.6% computing time) | |
dev: score 0.09212745776610048 error None devtrain: score 0.02475850586067128 error None | |
We have stored models for epochs [359, 364, 366, ..., 430, 431, 432] and keep epochs [359, 364, 366, 378, 406, 415, 425, 426, 428, 429, 430, 431, 432]. | |
We will delete the models of epochs [418, 424, 427]. | |
Deleted 0.9GB. | |
start epoch 433 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4185, avg_data_len=1) | |
train epoch 433, finished after 4185 steps, 0:19:40 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.433 | |
epoch 433 score: 0.0813635706666328 error: None elapsed: 0:19:40 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 433 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.4% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 433 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (54.2% computing time) | |
dev: score 0.09102702123518808 error None devtrain: score 0.02498445704970546 error None | |
14 epochs stored so far and keeping all. | |
start epoch 434 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4192, avg_data_len=1) | |
train epoch 434, finished after 4192 steps, 0:19:42 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.434 | |
epoch 434 score: 0.08190310761877713 error: None elapsed: 0:19:42 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 434 'dev' eval, finished after 31 steps, 0:00:04 elapsed (57.6% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 434 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (54.8% computing time) | |
dev: score 0.09116143886034037 error None devtrain: score 0.02524214008242845 error None | |
15 epochs stored so far and keeping all. | |
start epoch 435 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4172, avg_data_len=1) | |
train epoch 435, finished after 4172 steps, 0:19:35 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.435 | |
[2023-10-12 12:36:50,307] INFO: [32mRun time: 14:57:55 CPU: 0.80% RSS: 9.22GB VMS: 34.81GB[0m | |
epoch 435 score: 0.08109052435187306 error: None elapsed: 0:19:35 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 435 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.4% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 435 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.6% computing time) | |
dev: score 0.09162276287399478 error None devtrain: score 0.025338723087977005 error None | |
We have stored models for epochs [359, 364, 366, ..., 433, 434, 435] and keep epochs [359, 364, 366, 378, 406, 425, 426, 428, 429, 430, 431, 432, 433, 434, 435]. | |
We will delete the models of epochs [415]. | |
Deleted 293.7MB. | |
start epoch 436 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4206, avg_data_len=1) | |
train epoch 436, finished after 4206 steps, 0:19:47 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.436 | |
epoch 436 score: 0.08151726521685654 error: None elapsed: 0:19:47 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 436 'dev' eval, finished after 31 steps, 0:00:04 elapsed (57.0% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 436 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (57.4% computing time) | |
dev: score 0.09123822666065062 error None devtrain: score 0.024714037350225735 error None | |
We have stored models for epochs [359, 364, 366, ..., 434, 435, 436] and keep epochs [359, 364, 366, 378, 406, 425, 426, 428, 430, 431, 432, 433, 434, 435, 436]. | |
We will delete the models of epochs [429]. | |
Deleted 293.7MB. | |
start epoch 437 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4164, avg_data_len=1) | |
train epoch 437, finished after 4164 steps, 0:19:33 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.437 | |
epoch 437 score: 0.08122536741517272 error: None elapsed: 0:19:33 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 437 'dev' eval, finished after 31 steps, 0:00:04 elapsed (57.9% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 437 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (56.3% computing time) | |
dev: score 0.09141868853581352 error None devtrain: score 0.024465649144805472 error None | |
We have stored models for epochs [359, 364, 366, ..., 435, 436, 437] and keep epochs [359, 364, 366, 378, 406, 425, 426, 428, 430, 431, 433, 434, 435, 436, 437]. | |
We will delete the models of epochs [432]. | |
Deleted 293.7MB. | |
start epoch 438 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4201, avg_data_len=1) | |
train epoch 438, finished after 4201 steps, 0:19:43 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.438 | |
epoch 438 score: 0.08049115664195709 error: None elapsed: 0:19:43 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 438 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.7% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 438 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.3% computing time) | |
dev: score 0.09252951738679259 error None devtrain: score 0.024193224574030614 error None | |
We have stored models for epochs [359, 364, 366, ..., 436, 437, 438] and keep epochs [359, 364, 366, 378, 406, 425, 426, 431, 433, 434, 435, 436, 437, 438]. | |
We will delete the models of epochs [428, 430]. | |
Deleted 587.4MB. | |
start epoch 439 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4181, avg_data_len=1) | |
train epoch 439, finished after 4181 steps, 0:19:39 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.439 | |
epoch 439 score: 0.08105641147729524 error: None elapsed: 0:19:39 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 439 'dev' eval, finished after 31 steps, 0:00:04 elapsed (59.0% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 439 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.9% computing time) | |
dev: score 0.09247350170353517 error None devtrain: score 0.024670380220718414 error None | |
We have stored models for epochs [359, 364, 366, ..., 437, 438, 439] and keep epochs [359, 364, 366, 378, 406, 425, 426, 431, 433, 435, 436, 437, 438, 439]. | |
We will delete the models of epochs [434]. | |
Deleted 293.7MB. | |
start epoch 440 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4163, avg_data_len=1) | |
train epoch 440, finished after 4163 steps, 0:19:31 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.440 | |
epoch 440 score: 0.08158685601826442 error: None elapsed: 0:19:31 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 440 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.3% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 440 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.2% computing time) | |
dev: score 0.09227849658022595 error None devtrain: score 0.024860019932519344 error None | |
15 epochs stored so far and keeping all. | |
start epoch 441 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4186, avg_data_len=1) | |
train epoch 441, finished after 4186 steps, 0:19:38 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.441 | |
epoch 441 score: 0.08145152559881456 error: None elapsed: 0:19:38 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 441 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.3% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 441 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.3% computing time) | |
dev: score 0.09168181760276106 error None devtrain: score 0.024747965844799987 error None | |
We have stored models for epochs [359, 364, 366, ..., 439, 440, 441] and keep epochs [359, 364, 366, 378, 406, 425, 426, 431, 433, 435, 437, 438, 439, 440, 441]. | |
We will delete the models of epochs [436]. | |
Deleted 293.7MB. | |
start epoch 442 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4181, avg_data_len=1) | |
train epoch 442, finished after 4181 steps, 0:19:38 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.442 | |
epoch 442 score: 0.08088496131179494 error: None elapsed: 0:19:38 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 442 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.8% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 442 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.6% computing time) | |
dev: score 0.09188674846398792 error None devtrain: score 0.024465029976891536 error None | |
We have stored models for epochs [359, 364, 366, ..., 440, 441, 442] and keep epochs [359, 364, 366, 378, 406, 425, 426, 435, 437, 438, 439, 440, 441, 442]. | |
We will delete the models of epochs [431, 433]. | |
Deleted 587.4MB. | |
start epoch 443 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4210, avg_data_len=1) | |
train epoch 443, finished after 4210 steps, 0:19:49 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.443 | |
epoch 443 score: 0.08175147431729346 error: None elapsed: 0:19:49 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 443 'dev' eval, finished after 31 steps, 0:00:04 elapsed (59.0% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 443 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.3% computing time) | |
dev: score 0.0918915893859721 error None devtrain: score 0.023667121564226266 error None | |
15 epochs stored so far and keeping all. | |
start epoch 444 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4199, avg_data_len=1) | |
train epoch 444, finished after 4199 steps, 0:19:45 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.444 | |
epoch 444 score: 0.08247715732985156 error: None elapsed: 0:19:45 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 444 'dev' eval, finished after 31 steps, 0:00:04 elapsed (59.0% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 444 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.5% computing time) | |
dev: score 0.09163948981342522 error None devtrain: score 0.023713764680208253 error None | |
16 epochs stored so far and keeping all. | |
start epoch 445 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4184, avg_data_len=1) | |
train epoch 445, finished after 4184 steps, 0:19:40 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.445 | |
epoch 445 score: 0.08020359040302102 error: None elapsed: 0:19:40 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 445 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.3% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 445 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.3% computing time) | |
dev: score 0.09256254497999448 error None devtrain: score 0.02486299441057453 error None | |
We have stored models for epochs [359, 364, 366, ..., 443, 444, 445] and keep epochs [359, 364, 366, 378, 406, 425, 426, 435, 438, 439, 441, 442, 443, 444, 445]. | |
We will delete the models of epochs [437, 440]. | |
Deleted 587.4MB. | |
start epoch 446 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4204, avg_data_len=1) | |
train epoch 446, finished after 4204 steps, 0:19:46 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.446 | |
epoch 446 score: 0.07994435010516164 error: None elapsed: 0:19:46 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 446 'dev' eval, finished after 31 steps, 0:00:04 elapsed (59.2% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 446 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.9% computing time) | |
dev: score 0.09147745484412217 error None devtrain: score 0.024438725137651756 error None | |
We have stored models for epochs [359, 364, 366, ..., 444, 445, 446] and keep epochs [359, 364, 366, 378, 406, 425, 426, 438, 439, 442, 443, 444, 445, 446]. | |
We will delete the models of epochs [435, 441]. | |
Deleted 587.4MB. | |
start epoch 447 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4169, avg_data_len=1) | |
train epoch 447, finished after 4169 steps, 0:19:35 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.447 | |
epoch 447 score: 0.08131296808776979 error: None elapsed: 0:19:35 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 447 'dev' eval, finished after 31 steps, 0:00:04 elapsed (57.4% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 447 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.4% computing time) | |
dev: score 0.09146317625040964 error None devtrain: score 0.02413896989402761 error None | |
We have stored models for epochs [359, 364, 366, ..., 445, 446, 447] and keep epochs [359, 364, 366, 378, 406, 425, 438, 439, 442, 443, 444, 445, 446, 447]. | |
We will delete the models of epochs [426]. | |
Deleted 293.7MB. | |
start epoch 448 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4188, avg_data_len=1) | |
train epoch 448, finished after 4188 steps, 0:19:39 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.448 | |
epoch 448 score: 0.08064634796145521 error: None elapsed: 0:19:39 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 448 'dev' eval, finished after 31 steps, 0:00:04 elapsed (59.0% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 448 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.5% computing time) | |
dev: score 0.09170052168613516 error None devtrain: score 0.023905761854374333 error None | |
We have stored models for epochs [359, 364, 366, ..., 446, 447, 448] and keep epochs [359, 364, 366, 378, 406, 438, 442, 443, 444, 445, 446, 447, 448]. | |
We will delete the models of epochs [425, 439]. | |
Deleted 587.4MB. | |
start epoch 449 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4188, avg_data_len=1) | |
train epoch 449, finished after 4188 steps, 0:19:41 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.449 | |
epoch 449 score: 0.08224647954197745 error: None elapsed: 0:19:41 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 449 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.2% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 449 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (56.0% computing time) | |
dev: score 0.09155218936015956 error None devtrain: score 0.02332821860086142 error None | |
14 epochs stored so far and keeping all. | |
start epoch 450 with learning rate 1e-05 ... | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4182, avg_data_len=1) | |
train epoch 450, finished after 4182 steps, 0:19:38 elapsed (99.3% computing time) | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.450 | |
epoch 450 score: 0.08191543829412272 error: None elapsed: 0:19:38 | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1) | |
epoch 450 'dev' eval, finished after 31 steps, 0:00:04 elapsed (59.1% computing time) | |
Stats: | |
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1) | |
epoch 450 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.9% computing time) | |
dev: score 0.09324229272269091 error None devtrain: score 0.02383414714207614 error None | |
15 epochs stored so far and keeping all. | |
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.450 | |
Finished training in epoch 450. | |
elapsed: 19:57:14.0412 | |
Quitting | |
[2023-10-12 17:36:19,697] INFO: [32mRun time: 19:57:24 CPU: 0.40% RSS: 7.32GB VMS: 30.96GB[0m | |
[2023-10-12 17:36:29,741] INFO: [32mRun time: 19:57:34 CPU: 0.20% RSS: 77MB VMS: 294MB[0m | |
[2023-10-12 17:36:32,219] INFO: [32mMax resources: Run time: 19:57:37 CPU: 158.8% RSS: 9.86GB VMS: 36.00GB[0m | |
[2023-10-12 17:36:32,221] INFO: [32mJob finished successfully[0m | |
--------------------- Slurm Task Epilog ------------------------ | |
Job ID: 2550071 | |
Time: Do 12. Okt 17:36:32 CEST 2023 | |
Elapsed Time: 19:57:39 | |
Billing per second for TRES: billing=88,cpu=3,gres/gpu=1,mem=16G,node=1 | |
Show resource usage with e.g.: | |
sacct -j 2550071 -o Elapsed,TotalCPU,UserCPU,SystemCPU,MaxRSS,ReqTRES%60,MaxDiskRead,MaxDiskWrite | |
--------------------- Slurm Task Epilog ------------------------ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment