Skip to content

Instantly share code, notes, and snippets.

@Max-Ryujin
Created October 31, 2023 10:08
Show Gist options
  • Save Max-Ryujin/60d6dcc820ddd0e960e753806ceeb9af to your computer and use it in GitHub Desktop.
Save Max-Ryujin/60d6dcc820ddd0e960e753806ceeb9af to your computer and use it in GitHub Desktop.
Log Mel training
--------------------- Slurm Task Prolog ------------------------
Job ID: 2550071
Job name: ReturnnTrainingJob.UP1hAn0FnLY7.run
Host: cn-264
Date: Mi 11. Okt 21:38:53 CEST 2023
User: maximilian.kannen
Slurm account: hlt
Slurm partition: gpu_11gb
Work dir:
------------------
Node usage:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
2540953_1 gpu_11gb crnn.cus hwu R 11:13:21 1 cn-264
2550071_1 gpu_11gb ReturnnT maximili R 0:00 1 cn-264
------------------
Show launch script with:
sacct -B -j
------------------
--------------------- Slurm Task Prolog ------------------------
[2023-10-11 21:38:54,511] INFO: Generating grammar tables from /usr/local/lib/python3.8/dist-packages/blib2to3/Grammar.txt
[2023-10-11 21:38:54,524] INFO: Writing grammar tables to /u/maximilian.kannen/.cache/black/22.3.0/Grammar3.8.10.final.0.pickle
[2023-10-11 21:38:54,524] INFO: Writing failed: [Errno 2] No such file or directory: '/u/maximilian.kannen/.cache/black/22.3.0/tmpc54qbzqq'
[2023-10-11 21:38:54,525] INFO: Generating grammar tables from /usr/local/lib/python3.8/dist-packages/blib2to3/PatternGrammar.txt
[2023-10-11 21:38:54,526] INFO: Writing grammar tables to /u/maximilian.kannen/.cache/black/22.3.0/PatternGrammar3.8.10.final.0.pickle
[2023-10-11 21:38:54,526] INFO: Writing failed: [Errno 2] No such file or directory: '/u/maximilian.kannen/.cache/black/22.3.0/tmpiiu1gfw1'
[2023-10-11 21:38:55,054] INFO: Start Job: Job<alias/experiments/switchboard/ctc/feat/train_nn/conformer_bs5k_lgm80_baseline work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7> Task: run
[2023-10-11 21:38:55,054] INFO: Inputs:
[2023-10-11 21:38:55,054] INFO: /u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank
[2023-10-11 21:38:55,054] INFO: /u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank
[2023-10-11 21:38:55,054] INFO: /usr/bin/python3
[2023-10-11 21:38:55,054] INFO: /work/asr4/vieting/programs/rasr/20230707/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard
[2023-10-11 21:38:55,054] INFO: /u/maximilian.kannen/setups/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.Fzh6DWEkIA5y/output/segments.1
[2023-10-11 21:38:55,055] INFO: /u/maximilian.kannen/setups/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.SVlbt6fqP4Jn/output/segments.1
[2023-10-11 21:38:55,055] INFO: /u/maximilian.kannen/setups/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1
[2023-10-11 21:38:55,057] INFO: /u/maximilian.kannen/setups/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz
[2023-10-11 21:38:55,059] INFO: /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/oggzip/BlissToOggZipJob.lAFM8R9mzLpI/output/out.ogg.zip
[2023-10-11 21:38:55,060] INFO: /u/maximilian.kannen/setups/20230406_feat/work/i6_core/text/processing/TailJob.RiSM6fe2XipO/output/out.gz
[2023-10-11 21:38:55,061] INFO: /u/maximilian.kannen/setups/20230406_feat/work/i6_core/tools/git/CloneGitRepositoryJob.0CzeTYuWDN93/output/repository
[2023-10-11 21:38:55,062] INFO: /u/maximilian.kannen/setups/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml
Uname: uname_result(system='Linux', node='cn-264', release='5.15.0-39-generic', version='#42-Ubuntu SMP Thu Jun 9 23:42:32 UTC 2022', machine='x86_64', processor='x86_64')
Load: (2.06, 3.44, 6.7)
[2023-10-11 21:38:55,063] INFO: ------------------------------------------------------------
[2023-10-11 21:38:55,063] INFO: Starting subtask for arg id: 0 args: []
[2023-10-11 21:38:55,063] INFO: ------------------------------------------------------------
[2023-10-11 21:38:55,068] INFO: Run time: 0:00:00 CPU: 158.80% RSS: 79MB VMS: 294MB
RETURNN starting up, version 1.20231002.092600+git.97a829f9, date/time 2023-10-11-21-38-55 (UTC+0200), pid 487565, cwd /work/asr3/vieting/hiwis/kannen/sisyphus_work_dirs/swb/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/work, Python /usr/bin/python3
RETURNN command line options: ['/u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/returnn.config']
Hostname: cn-264
[2023-10-11 21:39:00,084] INFO: Run time: 0:00:05 CPU: 0.40% RSS: 386MB VMS: 1.56GB
TensorFlow: 2.8.0 (unknown) (<not-under-git> in /usr/local/lib/python3.8/dist-packages/tensorflow)
Use num_threads=1 (but min 2) via OMP_NUM_THREADS.
Setup TF inter and intra global thread pools, num_threads 2, session opts {'log_device_placement': False, 'device_count': {'GPU': 0}, 'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2}.
2023-10-11 21:39:00.976696: I tensorflow/core/platform/cpu_feature_guard.cc:152] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
CUDA_VISIBLE_DEVICES is set to '0'.
Collecting TensorFlow device list...
2023-10-11 21:39:03.572081: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /device:GPU:0 with 10245 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1
Local devices available to TensorFlow:
1/2: name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 9795765555376902247
xla_global_id: -1
2/2: name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 10742726656
locality {
bus_id: 1
links {
}
}
incarnation: 1539930335961771747
physical_device_desc: "device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1"
xla_global_id: 416903419
Using gpu device 0: NVIDIA GeForce GTX 1080 Ti
Hostname 'cn-264', GPU 0, GPU-dev-name 'NVIDIA GeForce GTX 1080 Ti', GPU-memory 10.0GB
LOG: connected to ('10.6.100.1', 10321)
LOG: destination: /var/tmp/maximilian.kannen/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/returnn/oggzip/BlissToOggZipJob.lAFM8R9mzLpI/output/out.ogg.zip
LOG: using existing file
Train data:
input: 1 x 1
output: {'raw': {'dtype': 'string', 'shape': ()}, 'orth': [256, 1], 'data': [1, 2]}
OggZipDataset, sequences: 0, frames: unknown
Dev data:
OggZipDataset, sequences: 0, frames: unknown
[2023-10-11 21:39:05,102] INFO: Run time: 0:00:10 CPU: 0.20% RSS: 1.05GB VMS: 12.70GB
Learning-rate-control: loading file learning_rates
Setup TF session with options {'log_device_placement': False, 'device_count': {'GPU': 1}} ...
2023-10-11 21:39:08.844346: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10245 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1
layer /'data': [B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)] float32
layer /features/'wave_norm': [B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)] float32
layer /features/'stft': [B,T|'⌈(-99+time:var:extern_data:data+-100)/80⌉'[B],F|F'stft:channel'(129)] complex64
layer /features/'abs': [B,T|'⌈(-99+time:var:extern_data:data+-100)/80⌉'[B],F|F'stft:channel'(129)] float32
layer /features/'power': [B,T|'⌈(-99+time:var:extern_data:data+-100)/80⌉'[B],F|F'stft:channel'(129)] float32
layer /features/'mel_filterbank': [B,T|'⌈(-99+time:var:extern_data:data+-100)/80⌉'[B],F|F'mel_filterbank:feature-dense'(80)] float32
layer /features/'log': [B,T|'⌈(-99+time:var:extern_data:data+-100)/80⌉'[B],F|F'mel_filterbank:feature-dense'(80)] float32
layer /features/'log10': [B,T|'⌈(-99+time:var:extern_data:data+-100)/80⌉'[B],F|F'mel_filterbank:feature-dense'(80)] float32
layer /features/'output': [B,T|'⌈(-99+time:var:extern_data:data+-100)/80⌉'[B],F|F'mel_filterbank:feature-dense'(80)] float32
DEPRECATION WARNING: batch_norm masked_time should be specified explicitly
This will be disallowed with behavior_version 12.
WARNING:tensorflow:From /work/asr3/vieting/hiwis/kannen/sisyphus_work_dirs/swb/i6_core/tools/git/CloneGitRepositoryJob.0CzeTYuWDN93/output/repository/returnn/tf/util/basic.py:1725: calling Ones.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
WARNING:tensorflow:From /work/asr3/vieting/hiwis/kannen/sisyphus_work_dirs/swb/i6_core/tools/git/CloneGitRepositoryJob.0CzeTYuWDN93/output/repository/returnn/tf/util/basic.py:1723: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
layer /'features': [B,T|'⌈(-99+time:var:extern_data:data+-100)/80⌉'[B],F|F'mel_filterbank:feature-dense'(80)] float32
layer /'specaug': [B,T|'⌈(-99+time:var:extern_data:data+-100)/80⌉'[B],F|F'mel_filterbank:feature-dense'(80)] float32
layer /'conv_source': [B,T|'⌈(-99+time:var:extern_data:data+-100)/80⌉'[B],F'mel_filterbank:feature-dense'(80),F|F'conv_source_split_dims1'(1)] float32
DEPRECATION WARNING: Explicitly specify in_spatial_dims when there is more than one spatial dim in the input.
This will be disallowed with behavior_version 8.
layer /'conv_1': [B,T|'⌈(-99+time:var:extern_data:data+-100)/80⌉'[B],F'mel_filterbank:feature-dense'(80),F|F'conv_1:channel'(32)] float32
layer /'conv_1_pool': [B,T|'⌈(-99+time:var:extern_data:data+-100)/80⌉'[B],'(mel_filterbank:feature-dense)//2'(40),F|F'conv_1:channel'(32)] float32
layer /'conv_2': [B,T|'⌈(-99+time:var:extern_data:data+-100)/160⌉'[B],'(mel_filterbank:feature-dense)//2'(40),F|F'conv_2:channel'(64)] float32
layer /'conv_3': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],'(mel_filterbank:feature-dense)//2'(40),F|F'conv_3:channel'(64)] float32
layer /'conv_merged': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'((mel_filterbank:feature-dense)//2)*conv_3:channel'(2560)] float32
DEPRECATION WARNING: MergeDimsLayer, only keep_order=True is allowed
This will be disallowed with behavior_version 6.
layer /'input_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'input_linear:feature-dense'(512)] float32
layer /'input_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'input_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_1_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'input_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_1_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_ffmod_1_linear_swish:feature-dense'(2048)] float32
layer /'conformer_1_ffmod_1_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32
[2023-10-11 21:39:10,125] INFO: Run time: 0:00:15 CPU: 0.40% RSS: 1.46GB VMS: 13.12GB
layer /'conformer_1_ffmod_1_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_1_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_conv_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_conv_mod_pointwise_conv_1': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_conv_mod_pointwise_conv_1:feature-dense'(1024)] float32
layer /'conformer_1_conv_mod_glu': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'(conformer_1_conv_mod_pointwise_conv_1:feature-dense)//2'(512)] float32
layer /'conformer_1_conv_mod_depthwise_conv': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_conv_mod_bn': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_conv_mod_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_conv_mod_pointwise_conv_2': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_conv_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_conv_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_mhsa_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_mhsa_mod_relpos_encoding': [T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_mhsa_mod_relpos_encoding_rel_pos_enc_feat'(64)] float32
layer /'conformer_1_mhsa_mod_self_attention': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_mhsa_mod_att_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_mhsa_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_mhsa_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_ffmod_2_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_ffmod_2_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_ffmod_2_linear_swish:feature-dense'(2048)] float32
layer /'conformer_1_ffmod_2_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_2_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_2_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_output': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_2_ffmod_1_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_2_ffmod_1_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_ffmod_1_linear_swish:feature-dense'(2048)] float32
layer /'conformer_2_ffmod_1_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_2_ffmod_1_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_2_ffmod_1_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_2_conv_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_2_conv_mod_pointwise_conv_1': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_conv_mod_pointwise_conv_1:feature-dense'(1024)] float32
layer /'conformer_2_conv_mod_glu': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'(conformer_2_conv_mod_pointwise_conv_1:feature-dense)//2'(512)] float32
layer /'conformer_2_conv_mod_depthwise_conv': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_2_conv_mod_bn': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_2_conv_mod_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_2_conv_mod_pointwise_conv_2': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_2_conv_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_2_conv_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_2_mhsa_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_2_mhsa_mod_relpos_encoding': [T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_mhsa_mod_relpos_encoding_rel_pos_enc_feat'(64)] float32
layer /'conformer_2_mhsa_mod_self_attention': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_2_mhsa_mod_att_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_2_mhsa_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_2_mhsa_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_2_ffmod_2_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_2_ffmod_2_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_ffmod_2_linear_swish:feature-dense'(2048)] float32
layer /'conformer_2_ffmod_2_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_2_ffmod_2_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_2_ffmod_2_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_2_output': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_3_ffmod_1_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_2_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_3_ffmod_1_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_ffmod_1_linear_swish:feature-dense'(2048)] float32
layer /'conformer_3_ffmod_1_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_3_ffmod_1_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_3_ffmod_1_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_3_conv_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_3_conv_mod_pointwise_conv_1': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_conv_mod_pointwise_conv_1:feature-dense'(1024)] float32
layer /'conformer_3_conv_mod_glu': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'(conformer_3_conv_mod_pointwise_conv_1:feature-dense)//2'(512)] float32
layer /'conformer_3_conv_mod_depthwise_conv': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_3_conv_mod_bn': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_3_conv_mod_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_3_conv_mod_pointwise_conv_2': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_3_conv_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_3_conv_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_3_mhsa_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_3_mhsa_mod_relpos_encoding': [T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_mhsa_mod_relpos_encoding_rel_pos_enc_feat'(64)] float32
layer /'conformer_3_mhsa_mod_self_attention': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_3_mhsa_mod_att_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_3_mhsa_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_3_mhsa_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_3_ffmod_2_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_3_ffmod_2_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_ffmod_2_linear_swish:feature-dense'(2048)] float32
layer /'conformer_3_ffmod_2_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_3_ffmod_2_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_3_ffmod_2_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_3_output': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_4_ffmod_1_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_3_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_4_ffmod_1_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_ffmod_1_linear_swish:feature-dense'(2048)] float32
layer /'conformer_4_ffmod_1_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_4_ffmod_1_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_4_ffmod_1_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_4_conv_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_4_conv_mod_pointwise_conv_1': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_conv_mod_pointwise_conv_1:feature-dense'(1024)] float32
layer /'conformer_4_conv_mod_glu': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'(conformer_4_conv_mod_pointwise_conv_1:feature-dense)//2'(512)] float32
layer /'conformer_4_conv_mod_depthwise_conv': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_4_conv_mod_bn': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_4_conv_mod_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_4_conv_mod_pointwise_conv_2': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_4_conv_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_4_conv_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_4_mhsa_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_4_mhsa_mod_relpos_encoding': [T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_mhsa_mod_relpos_encoding_rel_pos_enc_feat'(64)] float32
layer /'conformer_4_mhsa_mod_self_attention': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_4_mhsa_mod_att_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_4_mhsa_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_4_mhsa_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_4_ffmod_2_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_4_ffmod_2_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_ffmod_2_linear_swish:feature-dense'(2048)] float32
layer /'conformer_4_ffmod_2_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_4_ffmod_2_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_4_ffmod_2_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_4_output': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_5_ffmod_1_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_4_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_5_ffmod_1_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_ffmod_1_linear_swish:feature-dense'(2048)] float32
layer /'conformer_5_ffmod_1_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_5_ffmod_1_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_5_ffmod_1_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_5_conv_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_5_conv_mod_pointwise_conv_1': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_conv_mod_pointwise_conv_1:feature-dense'(1024)] float32
layer /'conformer_5_conv_mod_glu': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'(conformer_5_conv_mod_pointwise_conv_1:feature-dense)//2'(512)] float32
layer /'conformer_5_conv_mod_depthwise_conv': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_5_conv_mod_bn': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_5_conv_mod_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_5_conv_mod_pointwise_conv_2': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_5_conv_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_5_conv_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_5_mhsa_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_5_mhsa_mod_relpos_encoding': [T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_mhsa_mod_relpos_encoding_rel_pos_enc_feat'(64)] float32
layer /'conformer_5_mhsa_mod_self_attention': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_5_mhsa_mod_att_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_5_mhsa_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_5_mhsa_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_5_ffmod_2_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_5_ffmod_2_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_ffmod_2_linear_swish:feature-dense'(2048)] float32
layer /'conformer_5_ffmod_2_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_5_ffmod_2_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_5_ffmod_2_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_5_output': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_6_ffmod_1_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_5_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_6_ffmod_1_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_ffmod_1_linear_swish:feature-dense'(2048)] float32
layer /'conformer_6_ffmod_1_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_6_ffmod_1_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_6_ffmod_1_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_6_conv_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_6_conv_mod_pointwise_conv_1': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_conv_mod_pointwise_conv_1:feature-dense'(1024)] float32
layer /'conformer_6_conv_mod_glu': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'(conformer_6_conv_mod_pointwise_conv_1:feature-dense)//2'(512)] float32
layer /'conformer_6_conv_mod_depthwise_conv': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_6_conv_mod_bn': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_6_conv_mod_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_6_conv_mod_pointwise_conv_2': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_6_conv_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_6_conv_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_6_mhsa_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_6_mhsa_mod_relpos_encoding': [T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_mhsa_mod_relpos_encoding_rel_pos_enc_feat'(64)] float32
layer /'conformer_6_mhsa_mod_self_attention': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_6_mhsa_mod_att_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_6_mhsa_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_6_mhsa_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_6_ffmod_2_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_6_ffmod_2_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_ffmod_2_linear_swish:feature-dense'(2048)] float32
layer /'conformer_6_ffmod_2_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_6_ffmod_2_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_6_ffmod_2_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_6_output': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_7_ffmod_1_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_6_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_7_ffmod_1_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_ffmod_1_linear_swish:feature-dense'(2048)] float32
layer /'conformer_7_ffmod_1_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_7_ffmod_1_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_7_ffmod_1_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_7_conv_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_7_conv_mod_pointwise_conv_1': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_conv_mod_pointwise_conv_1:feature-dense'(1024)] float32
layer /'conformer_7_conv_mod_glu': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'(conformer_7_conv_mod_pointwise_conv_1:feature-dense)//2'(512)] float32
layer /'conformer_7_conv_mod_depthwise_conv': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_7_conv_mod_bn': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_7_conv_mod_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_7_conv_mod_pointwise_conv_2': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_7_conv_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_7_conv_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_7_mhsa_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_7_mhsa_mod_relpos_encoding': [T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_mhsa_mod_relpos_encoding_rel_pos_enc_feat'(64)] float32
layer /'conformer_7_mhsa_mod_self_attention': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_7_mhsa_mod_att_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_7_mhsa_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_7_mhsa_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_7_ffmod_2_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_7_ffmod_2_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_ffmod_2_linear_swish:feature-dense'(2048)] float32
layer /'conformer_7_ffmod_2_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_7_ffmod_2_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_7_ffmod_2_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_7_output': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_8_ffmod_1_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_7_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_8_ffmod_1_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_ffmod_1_linear_swish:feature-dense'(2048)] float32
layer /'conformer_8_ffmod_1_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_8_ffmod_1_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_8_ffmod_1_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_8_conv_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_8_conv_mod_pointwise_conv_1': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_conv_mod_pointwise_conv_1:feature-dense'(1024)] float32
layer /'conformer_8_conv_mod_glu': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'(conformer_8_conv_mod_pointwise_conv_1:feature-dense)//2'(512)] float32
layer /'conformer_8_conv_mod_depthwise_conv': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_8_conv_mod_bn': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_8_conv_mod_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_8_conv_mod_pointwise_conv_2': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_8_conv_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_8_conv_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_8_mhsa_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_8_mhsa_mod_relpos_encoding': [T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_mhsa_mod_relpos_encoding_rel_pos_enc_feat'(64)] float32
layer /'conformer_8_mhsa_mod_self_attention': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_8_mhsa_mod_att_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_8_mhsa_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_8_mhsa_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_8_ffmod_2_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_8_ffmod_2_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_ffmod_2_linear_swish:feature-dense'(2048)] float32
layer /'conformer_8_ffmod_2_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_8_ffmod_2_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_8_ffmod_2_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_8_output': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_9_ffmod_1_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_8_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_9_ffmod_1_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_ffmod_1_linear_swish:feature-dense'(2048)] float32
layer /'conformer_9_ffmod_1_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_9_ffmod_1_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_9_ffmod_1_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_9_conv_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_9_conv_mod_pointwise_conv_1': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_conv_mod_pointwise_conv_1:feature-dense'(1024)] float32
layer /'conformer_9_conv_mod_glu': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'(conformer_9_conv_mod_pointwise_conv_1:feature-dense)//2'(512)] float32
layer /'conformer_9_conv_mod_depthwise_conv': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_9_conv_mod_bn': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_9_conv_mod_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_9_conv_mod_pointwise_conv_2': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_9_conv_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_9_conv_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_9_mhsa_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_9_mhsa_mod_relpos_encoding': [T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_mhsa_mod_relpos_encoding_rel_pos_enc_feat'(64)] float32
layer /'conformer_9_mhsa_mod_self_attention': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_9_mhsa_mod_att_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_9_mhsa_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_9_mhsa_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_9_ffmod_2_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_9_ffmod_2_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_ffmod_2_linear_swish:feature-dense'(2048)] float32
layer /'conformer_9_ffmod_2_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_9_ffmod_2_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_9_ffmod_2_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_9_output': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_10_ffmod_1_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_9_ffmod_2_dropout_linear:feature-dense'(512)] float32
[2023-10-11 21:39:15,148] INFO: Run time: 0:00:20 CPU: 0.40% RSS: 1.63GB VMS: 13.30GB
layer /'conformer_10_ffmod_1_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_ffmod_1_linear_swish:feature-dense'(2048)] float32
layer /'conformer_10_ffmod_1_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_10_ffmod_1_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_10_ffmod_1_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_10_conv_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_10_conv_mod_pointwise_conv_1': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_conv_mod_pointwise_conv_1:feature-dense'(1024)] float32
layer /'conformer_10_conv_mod_glu': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'(conformer_10_conv_mod_pointwise_conv_1:feature-dense)//2'(512)] float32
layer /'conformer_10_conv_mod_depthwise_conv': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_10_conv_mod_bn': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_10_conv_mod_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_10_conv_mod_pointwise_conv_2': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_10_conv_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_10_conv_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_10_mhsa_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_10_mhsa_mod_relpos_encoding': [T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_mhsa_mod_relpos_encoding_rel_pos_enc_feat'(64)] float32
layer /'conformer_10_mhsa_mod_self_attention': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_10_mhsa_mod_att_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_10_mhsa_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_10_mhsa_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_10_ffmod_2_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_10_ffmod_2_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_ffmod_2_linear_swish:feature-dense'(2048)] float32
layer /'conformer_10_ffmod_2_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_10_ffmod_2_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_10_ffmod_2_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_10_output': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_11_ffmod_1_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_10_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_11_ffmod_1_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_ffmod_1_linear_swish:feature-dense'(2048)] float32
layer /'conformer_11_ffmod_1_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_11_ffmod_1_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_11_ffmod_1_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_11_conv_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_11_conv_mod_pointwise_conv_1': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_conv_mod_pointwise_conv_1:feature-dense'(1024)] float32
layer /'conformer_11_conv_mod_glu': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'(conformer_11_conv_mod_pointwise_conv_1:feature-dense)//2'(512)] float32
layer /'conformer_11_conv_mod_depthwise_conv': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_11_conv_mod_bn': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_11_conv_mod_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_11_conv_mod_pointwise_conv_2': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_11_conv_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_11_conv_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_11_mhsa_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_11_mhsa_mod_relpos_encoding': [T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_mhsa_mod_relpos_encoding_rel_pos_enc_feat'(64)] float32
layer /'conformer_11_mhsa_mod_self_attention': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_11_mhsa_mod_att_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_11_mhsa_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_11_mhsa_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_11_ffmod_2_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_11_ffmod_2_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_ffmod_2_linear_swish:feature-dense'(2048)] float32
layer /'conformer_11_ffmod_2_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_11_ffmod_2_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_11_ffmod_2_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_11_output': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_12_ffmod_1_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_11_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_12_ffmod_1_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_ffmod_1_linear_swish:feature-dense'(2048)] float32
layer /'conformer_12_ffmod_1_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_12_ffmod_1_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_12_ffmod_1_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_12_conv_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_12_conv_mod_pointwise_conv_1': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_conv_mod_pointwise_conv_1:feature-dense'(1024)] float32
layer /'conformer_12_conv_mod_glu': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'(conformer_12_conv_mod_pointwise_conv_1:feature-dense)//2'(512)] float32
layer /'conformer_12_conv_mod_depthwise_conv': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_12_conv_mod_bn': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_12_conv_mod_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_12_conv_mod_pointwise_conv_2': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_12_conv_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_12_conv_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_12_mhsa_mod_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_12_mhsa_mod_relpos_encoding': [T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_mhsa_mod_relpos_encoding_rel_pos_enc_feat'(64)] float32
layer /'conformer_12_mhsa_mod_self_attention': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_12_mhsa_mod_att_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_12_mhsa_mod_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_12_mhsa_mod_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_12_ffmod_2_ln': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_12_ffmod_2_linear_swish': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_ffmod_2_linear_swish:feature-dense'(2048)] float32
layer /'conformer_12_ffmod_2_dropout_linear': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_12_ffmod_2_dropout': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_12_ffmod_2_half_res_add': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_12_output': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'encoder': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'conformer_12_ffmod_2_dropout_linear:feature-dense'(512)] float32
2023-10-11 21:39:17.064803: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10245 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1
layer /'output': [B,T|'⌈(-99+time:var:extern_data:data+-100)/320⌉'[B],F|F'output:feature-dense'(88)] float32
WARNING:tensorflow:From /work/asr3/vieting/hiwis/kannen/sisyphus_work_dirs/swb/i6_core/tools/git/CloneGitRepositoryJob.0CzeTYuWDN93/output/repository/returnn/tf/sprint.py:54: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
options available in V2.
- tf.py_function takes a python function which manipulates tf eager
tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
an ndarray (just call tensor.numpy()) but having access to eager tensors
means `tf.py_function`s can use accelerators such as GPUs as well as
being differentiable using a gradient tape.
- tf.numpy_function maintains the semantics of the deprecated tf.py_func
(it is not differentiable, and manipulates numpy arrays). It drops the
stateful argument making all functions stateful.
Network layer topology:
extern data: data: Tensor{[B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)]}, seq_tag: Tensor{[B?], dtype='string'}
used data keys: ['data', 'seq_tag']
layers:
layer batch_norm 'conformer_10_conv_mod_bn' #: 512
layer conv 'conformer_10_conv_mod_depthwise_conv' #: 512
layer copy 'conformer_10_conv_mod_dropout' #: 512
layer gating 'conformer_10_conv_mod_glu' #: 512
layer layer_norm 'conformer_10_conv_mod_ln' #: 512
layer linear 'conformer_10_conv_mod_pointwise_conv_1' #: 1024
layer linear 'conformer_10_conv_mod_pointwise_conv_2' #: 512
layer combine 'conformer_10_conv_mod_res_add' #: 512
layer activation 'conformer_10_conv_mod_swish' #: 512
layer copy 'conformer_10_ffmod_1_dropout' #: 512
layer linear 'conformer_10_ffmod_1_dropout_linear' #: 512
layer eval 'conformer_10_ffmod_1_half_res_add' #: 512
layer linear 'conformer_10_ffmod_1_linear_swish' #: 2048
layer layer_norm 'conformer_10_ffmod_1_ln' #: 512
layer copy 'conformer_10_ffmod_2_dropout' #: 512
layer linear 'conformer_10_ffmod_2_dropout_linear' #: 512
layer eval 'conformer_10_ffmod_2_half_res_add' #: 512
layer linear 'conformer_10_ffmod_2_linear_swish' #: 2048
layer layer_norm 'conformer_10_ffmod_2_ln' #: 512
layer linear 'conformer_10_mhsa_mod_att_linear' #: 512
layer copy 'conformer_10_mhsa_mod_dropout' #: 512
layer layer_norm 'conformer_10_mhsa_mod_ln' #: 512
layer relative_positional_encoding 'conformer_10_mhsa_mod_relpos_encoding' #: 64
layer combine 'conformer_10_mhsa_mod_res_add' #: 512
layer self_attention 'conformer_10_mhsa_mod_self_attention' #: 512
layer layer_norm 'conformer_10_output' #: 512
layer batch_norm 'conformer_11_conv_mod_bn' #: 512
layer conv 'conformer_11_conv_mod_depthwise_conv' #: 512
layer copy 'conformer_11_conv_mod_dropout' #: 512
layer gating 'conformer_11_conv_mod_glu' #: 512
layer layer_norm 'conformer_11_conv_mod_ln' #: 512
layer linear 'conformer_11_conv_mod_pointwise_conv_1' #: 1024
layer linear 'conformer_11_conv_mod_pointwise_conv_2' #: 512
layer combine 'conformer_11_conv_mod_res_add' #: 512
layer activation 'conformer_11_conv_mod_swish' #: 512
layer copy 'conformer_11_ffmod_1_dropout' #: 512
layer linear 'conformer_11_ffmod_1_dropout_linear' #: 512
layer eval 'conformer_11_ffmod_1_half_res_add' #: 512
layer linear 'conformer_11_ffmod_1_linear_swish' #: 2048
layer layer_norm 'conformer_11_ffmod_1_ln' #: 512
layer copy 'conformer_11_ffmod_2_dropout' #: 512
layer linear 'conformer_11_ffmod_2_dropout_linear' #: 512
layer eval 'conformer_11_ffmod_2_half_res_add' #: 512
layer linear 'conformer_11_ffmod_2_linear_swish' #: 2048
layer layer_norm 'conformer_11_ffmod_2_ln' #: 512
layer linear 'conformer_11_mhsa_mod_att_linear' #: 512
layer copy 'conformer_11_mhsa_mod_dropout' #: 512
layer layer_norm 'conformer_11_mhsa_mod_ln' #: 512
layer relative_positional_encoding 'conformer_11_mhsa_mod_relpos_encoding' #: 64
layer combine 'conformer_11_mhsa_mod_res_add' #: 512
layer self_attention 'conformer_11_mhsa_mod_self_attention' #: 512
layer layer_norm 'conformer_11_output' #: 512
layer batch_norm 'conformer_12_conv_mod_bn' #: 512
layer conv 'conformer_12_conv_mod_depthwise_conv' #: 512
layer copy 'conformer_12_conv_mod_dropout' #: 512
layer gating 'conformer_12_conv_mod_glu' #: 512
layer layer_norm 'conformer_12_conv_mod_ln' #: 512
layer linear 'conformer_12_conv_mod_pointwise_conv_1' #: 1024
layer linear 'conformer_12_conv_mod_pointwise_conv_2' #: 512
layer combine 'conformer_12_conv_mod_res_add' #: 512
layer activation 'conformer_12_conv_mod_swish' #: 512
layer copy 'conformer_12_ffmod_1_dropout' #: 512
layer linear 'conformer_12_ffmod_1_dropout_linear' #: 512
layer eval 'conformer_12_ffmod_1_half_res_add' #: 512
layer linear 'conformer_12_ffmod_1_linear_swish' #: 2048
layer layer_norm 'conformer_12_ffmod_1_ln' #: 512
layer copy 'conformer_12_ffmod_2_dropout' #: 512
layer linear 'conformer_12_ffmod_2_dropout_linear' #: 512
layer eval 'conformer_12_ffmod_2_half_res_add' #: 512
layer linear 'conformer_12_ffmod_2_linear_swish' #: 2048
layer layer_norm 'conformer_12_ffmod_2_ln' #: 512
layer linear 'conformer_12_mhsa_mod_att_linear' #: 512
layer copy 'conformer_12_mhsa_mod_dropout' #: 512
layer layer_norm 'conformer_12_mhsa_mod_ln' #: 512
layer relative_positional_encoding 'conformer_12_mhsa_mod_relpos_encoding' #: 64
layer combine 'conformer_12_mhsa_mod_res_add' #: 512
layer self_attention 'conformer_12_mhsa_mod_self_attention' #: 512
layer layer_norm 'conformer_12_output' #: 512
layer batch_norm 'conformer_1_conv_mod_bn' #: 512
layer conv 'conformer_1_conv_mod_depthwise_conv' #: 512
layer copy 'conformer_1_conv_mod_dropout' #: 512
layer gating 'conformer_1_conv_mod_glu' #: 512
layer layer_norm 'conformer_1_conv_mod_ln' #: 512
layer linear 'conformer_1_conv_mod_pointwise_conv_1' #: 1024
layer linear 'conformer_1_conv_mod_pointwise_conv_2' #: 512
layer combine 'conformer_1_conv_mod_res_add' #: 512
layer activation 'conformer_1_conv_mod_swish' #: 512
layer copy 'conformer_1_ffmod_1_dropout' #: 512
layer linear 'conformer_1_ffmod_1_dropout_linear' #: 512
layer eval 'conformer_1_ffmod_1_half_res_add' #: 512
layer linear 'conformer_1_ffmod_1_linear_swish' #: 2048
layer layer_norm 'conformer_1_ffmod_1_ln' #: 512
layer copy 'conformer_1_ffmod_2_dropout' #: 512
layer linear 'conformer_1_ffmod_2_dropout_linear' #: 512
layer eval 'conformer_1_ffmod_2_half_res_add' #: 512
layer linear 'conformer_1_ffmod_2_linear_swish' #: 2048
layer layer_norm 'conformer_1_ffmod_2_ln' #: 512
layer linear 'conformer_1_mhsa_mod_att_linear' #: 512
layer copy 'conformer_1_mhsa_mod_dropout' #: 512
layer layer_norm 'conformer_1_mhsa_mod_ln' #: 512
layer relative_positional_encoding 'conformer_1_mhsa_mod_relpos_encoding' #: 64
layer combine 'conformer_1_mhsa_mod_res_add' #: 512
layer self_attention 'conformer_1_mhsa_mod_self_attention' #: 512
layer layer_norm 'conformer_1_output' #: 512
layer batch_norm 'conformer_2_conv_mod_bn' #: 512
layer conv 'conformer_2_conv_mod_depthwise_conv' #: 512
layer copy 'conformer_2_conv_mod_dropout' #: 512
layer gating 'conformer_2_conv_mod_glu' #: 512
layer layer_norm 'conformer_2_conv_mod_ln' #: 512
layer linear 'conformer_2_conv_mod_pointwise_conv_1' #: 1024
layer linear 'conformer_2_conv_mod_pointwise_conv_2' #: 512
layer combine 'conformer_2_conv_mod_res_add' #: 512
layer activation 'conformer_2_conv_mod_swish' #: 512
layer copy 'conformer_2_ffmod_1_dropout' #: 512
layer linear 'conformer_2_ffmod_1_dropout_linear' #: 512
layer eval 'conformer_2_ffmod_1_half_res_add' #: 512
layer linear 'conformer_2_ffmod_1_linear_swish' #: 2048
layer layer_norm 'conformer_2_ffmod_1_ln' #: 512
layer copy 'conformer_2_ffmod_2_dropout' #: 512
layer linear 'conformer_2_ffmod_2_dropout_linear' #: 512
layer eval 'conformer_2_ffmod_2_half_res_add' #: 512
layer linear 'conformer_2_ffmod_2_linear_swish' #: 2048
layer layer_norm 'conformer_2_ffmod_2_ln' #: 512
layer linear 'conformer_2_mhsa_mod_att_linear' #: 512
layer copy 'conformer_2_mhsa_mod_dropout' #: 512
layer layer_norm 'conformer_2_mhsa_mod_ln' #: 512
layer relative_positional_encoding 'conformer_2_mhsa_mod_relpos_encoding' #: 64
layer combine 'conformer_2_mhsa_mod_res_add' #: 512
layer self_attention 'conformer_2_mhsa_mod_self_attention' #: 512
layer layer_norm 'conformer_2_output' #: 512
layer batch_norm 'conformer_3_conv_mod_bn' #: 512
layer conv 'conformer_3_conv_mod_depthwise_conv' #: 512
layer copy 'conformer_3_conv_mod_dropout' #: 512
layer gating 'conformer_3_conv_mod_glu' #: 512
layer layer_norm 'conformer_3_conv_mod_ln' #: 512
layer linear 'conformer_3_conv_mod_pointwise_conv_1' #: 1024
layer linear 'conformer_3_conv_mod_pointwise_conv_2' #: 512
layer combine 'conformer_3_conv_mod_res_add' #: 512
layer activation 'conformer_3_conv_mod_swish' #: 512
layer copy 'conformer_3_ffmod_1_dropout' #: 512
layer linear 'conformer_3_ffmod_1_dropout_linear' #: 512
layer eval 'conformer_3_ffmod_1_half_res_add' #: 512
layer linear 'conformer_3_ffmod_1_linear_swish' #: 2048
layer layer_norm 'conformer_3_ffmod_1_ln' #: 512
layer copy 'conformer_3_ffmod_2_dropout' #: 512
layer linear 'conformer_3_ffmod_2_dropout_linear' #: 512
layer eval 'conformer_3_ffmod_2_half_res_add' #: 512
layer linear 'conformer_3_ffmod_2_linear_swish' #: 2048
layer layer_norm 'conformer_3_ffmod_2_ln' #: 512
layer linear 'conformer_3_mhsa_mod_att_linear' #: 512
layer copy 'conformer_3_mhsa_mod_dropout' #: 512
layer layer_norm 'conformer_3_mhsa_mod_ln' #: 512
layer relative_positional_encoding 'conformer_3_mhsa_mod_relpos_encoding' #: 64
layer combine 'conformer_3_mhsa_mod_res_add' #: 512
layer self_attention 'conformer_3_mhsa_mod_self_attention' #: 512
layer layer_norm 'conformer_3_output' #: 512
layer batch_norm 'conformer_4_conv_mod_bn' #: 512
layer conv 'conformer_4_conv_mod_depthwise_conv' #: 512
layer copy 'conformer_4_conv_mod_dropout' #: 512
layer gating 'conformer_4_conv_mod_glu' #: 512
layer layer_norm 'conformer_4_conv_mod_ln' #: 512
layer linear 'conformer_4_conv_mod_pointwise_conv_1' #: 1024
layer linear 'conformer_4_conv_mod_pointwise_conv_2' #: 512
layer combine 'conformer_4_conv_mod_res_add' #: 512
layer activation 'conformer_4_conv_mod_swish' #: 512
layer copy 'conformer_4_ffmod_1_dropout' #: 512
layer linear 'conformer_4_ffmod_1_dropout_linear' #: 512
layer eval 'conformer_4_ffmod_1_half_res_add' #: 512
layer linear 'conformer_4_ffmod_1_linear_swish' #: 2048
layer layer_norm 'conformer_4_ffmod_1_ln' #: 512
layer copy 'conformer_4_ffmod_2_dropout' #: 512
layer linear 'conformer_4_ffmod_2_dropout_linear' #: 512
layer eval 'conformer_4_ffmod_2_half_res_add' #: 512
layer linear 'conformer_4_ffmod_2_linear_swish' #: 2048
layer layer_norm 'conformer_4_ffmod_2_ln' #: 512
layer linear 'conformer_4_mhsa_mod_att_linear' #: 512
layer copy 'conformer_4_mhsa_mod_dropout' #: 512
layer layer_norm 'conformer_4_mhsa_mod_ln' #: 512
layer relative_positional_encoding 'conformer_4_mhsa_mod_relpos_encoding' #: 64
layer combine 'conformer_4_mhsa_mod_res_add' #: 512
layer self_attention 'conformer_4_mhsa_mod_self_attention' #: 512
layer layer_norm 'conformer_4_output' #: 512
layer batch_norm 'conformer_5_conv_mod_bn' #: 512
layer conv 'conformer_5_conv_mod_depthwise_conv' #: 512
layer copy 'conformer_5_conv_mod_dropout' #: 512
layer gating 'conformer_5_conv_mod_glu' #: 512
layer layer_norm 'conformer_5_conv_mod_ln' #: 512
layer linear 'conformer_5_conv_mod_pointwise_conv_1' #: 1024
layer linear 'conformer_5_conv_mod_pointwise_conv_2' #: 512
layer combine 'conformer_5_conv_mod_res_add' #: 512
layer activation 'conformer_5_conv_mod_swish' #: 512
layer copy 'conformer_5_ffmod_1_dropout' #: 512
layer linear 'conformer_5_ffmod_1_dropout_linear' #: 512
layer eval 'conformer_5_ffmod_1_half_res_add' #: 512
layer linear 'conformer_5_ffmod_1_linear_swish' #: 2048
layer layer_norm 'conformer_5_ffmod_1_ln' #: 512
layer copy 'conformer_5_ffmod_2_dropout' #: 512
layer linear 'conformer_5_ffmod_2_dropout_linear' #: 512
layer eval 'conformer_5_ffmod_2_half_res_add' #: 512
layer linear 'conformer_5_ffmod_2_linear_swish' #: 2048
layer layer_norm 'conformer_5_ffmod_2_ln' #: 512
layer linear 'conformer_5_mhsa_mod_att_linear' #: 512
layer copy 'conformer_5_mhsa_mod_dropout' #: 512
layer layer_norm 'conformer_5_mhsa_mod_ln' #: 512
layer relative_positional_encoding 'conformer_5_mhsa_mod_relpos_encoding' #: 64
layer combine 'conformer_5_mhsa_mod_res_add' #: 512
layer self_attention 'conformer_5_mhsa_mod_self_attention' #: 512
layer layer_norm 'conformer_5_output' #: 512
layer batch_norm 'conformer_6_conv_mod_bn' #: 512
layer conv 'conformer_6_conv_mod_depthwise_conv' #: 512
layer copy 'conformer_6_conv_mod_dropout' #: 512
layer gating 'conformer_6_conv_mod_glu' #: 512
layer layer_norm 'conformer_6_conv_mod_ln' #: 512
layer linear 'conformer_6_conv_mod_pointwise_conv_1' #: 1024
layer linear 'conformer_6_conv_mod_pointwise_conv_2' #: 512
layer combine 'conformer_6_conv_mod_res_add' #: 512
layer activation 'conformer_6_conv_mod_swish' #: 512
layer copy 'conformer_6_ffmod_1_dropout' #: 512
layer linear 'conformer_6_ffmod_1_dropout_linear' #: 512
layer eval 'conformer_6_ffmod_1_half_res_add' #: 512
layer linear 'conformer_6_ffmod_1_linear_swish' #: 2048
layer layer_norm 'conformer_6_ffmod_1_ln' #: 512
layer copy 'conformer_6_ffmod_2_dropout' #: 512
layer linear 'conformer_6_ffmod_2_dropout_linear' #: 512
layer eval 'conformer_6_ffmod_2_half_res_add' #: 512
layer linear 'conformer_6_ffmod_2_linear_swish' #: 2048
layer layer_norm 'conformer_6_ffmod_2_ln' #: 512
layer linear 'conformer_6_mhsa_mod_att_linear' #: 512
layer copy 'conformer_6_mhsa_mod_dropout' #: 512
layer layer_norm 'conformer_6_mhsa_mod_ln' #: 512
layer relative_positional_encoding 'conformer_6_mhsa_mod_relpos_encoding' #: 64
layer combine 'conformer_6_mhsa_mod_res_add' #: 512
layer self_attention 'conformer_6_mhsa_mod_self_attention' #: 512
layer layer_norm 'conformer_6_output' #: 512
layer batch_norm 'conformer_7_conv_mod_bn' #: 512
layer conv 'conformer_7_conv_mod_depthwise_conv' #: 512
layer copy 'conformer_7_conv_mod_dropout' #: 512
layer gating 'conformer_7_conv_mod_glu' #: 512
layer layer_norm 'conformer_7_conv_mod_ln' #: 512
layer linear 'conformer_7_conv_mod_pointwise_conv_1' #: 1024
layer linear 'conformer_7_conv_mod_pointwise_conv_2' #: 512
layer combine 'conformer_7_conv_mod_res_add' #: 512
layer activation 'conformer_7_conv_mod_swish' #: 512
layer copy 'conformer_7_ffmod_1_dropout' #: 512
layer linear 'conformer_7_ffmod_1_dropout_linear' #: 512
layer eval 'conformer_7_ffmod_1_half_res_add' #: 512
layer linear 'conformer_7_ffmod_1_linear_swish' #: 2048
layer layer_norm 'conformer_7_ffmod_1_ln' #: 512
layer copy 'conformer_7_ffmod_2_dropout' #: 512
layer linear 'conformer_7_ffmod_2_dropout_linear' #: 512
layer eval 'conformer_7_ffmod_2_half_res_add' #: 512
layer linear 'conformer_7_ffmod_2_linear_swish' #: 2048
layer layer_norm 'conformer_7_ffmod_2_ln' #: 512
layer linear 'conformer_7_mhsa_mod_att_linear' #: 512
layer copy 'conformer_7_mhsa_mod_dropout' #: 512
layer layer_norm 'conformer_7_mhsa_mod_ln' #: 512
layer relative_positional_encoding 'conformer_7_mhsa_mod_relpos_encoding' #: 64
layer combine 'conformer_7_mhsa_mod_res_add' #: 512
layer self_attention 'conformer_7_mhsa_mod_self_attention' #: 512
layer layer_norm 'conformer_7_output' #: 512
layer batch_norm 'conformer_8_conv_mod_bn' #: 512
layer conv 'conformer_8_conv_mod_depthwise_conv' #: 512
layer copy 'conformer_8_conv_mod_dropout' #: 512
layer gating 'conformer_8_conv_mod_glu' #: 512
layer layer_norm 'conformer_8_conv_mod_ln' #: 512
layer linear 'conformer_8_conv_mod_pointwise_conv_1' #: 1024
layer linear 'conformer_8_conv_mod_pointwise_conv_2' #: 512
layer combine 'conformer_8_conv_mod_res_add' #: 512
layer activation 'conformer_8_conv_mod_swish' #: 512
layer copy 'conformer_8_ffmod_1_dropout' #: 512
layer linear 'conformer_8_ffmod_1_dropout_linear' #: 512
layer eval 'conformer_8_ffmod_1_half_res_add' #: 512
layer linear 'conformer_8_ffmod_1_linear_swish' #: 2048
layer layer_norm 'conformer_8_ffmod_1_ln' #: 512
layer copy 'conformer_8_ffmod_2_dropout' #: 512
layer linear 'conformer_8_ffmod_2_dropout_linear' #: 512
layer eval 'conformer_8_ffmod_2_half_res_add' #: 512
layer linear 'conformer_8_ffmod_2_linear_swish' #: 2048
layer layer_norm 'conformer_8_ffmod_2_ln' #: 512
layer linear 'conformer_8_mhsa_mod_att_linear' #: 512
layer copy 'conformer_8_mhsa_mod_dropout' #: 512
layer layer_norm 'conformer_8_mhsa_mod_ln' #: 512
layer relative_positional_encoding 'conformer_8_mhsa_mod_relpos_encoding' #: 64
layer combine 'conformer_8_mhsa_mod_res_add' #: 512
layer self_attention 'conformer_8_mhsa_mod_self_attention' #: 512
layer layer_norm 'conformer_8_output' #: 512
layer batch_norm 'conformer_9_conv_mod_bn' #: 512
layer conv 'conformer_9_conv_mod_depthwise_conv' #: 512
layer copy 'conformer_9_conv_mod_dropout' #: 512
layer gating 'conformer_9_conv_mod_glu' #: 512
layer layer_norm 'conformer_9_conv_mod_ln' #: 512
layer linear 'conformer_9_conv_mod_pointwise_conv_1' #: 1024
layer linear 'conformer_9_conv_mod_pointwise_conv_2' #: 512
layer combine 'conformer_9_conv_mod_res_add' #: 512
layer activation 'conformer_9_conv_mod_swish' #: 512
layer copy 'conformer_9_ffmod_1_dropout' #: 512
layer linear 'conformer_9_ffmod_1_dropout_linear' #: 512
layer eval 'conformer_9_ffmod_1_half_res_add' #: 512
layer linear 'conformer_9_ffmod_1_linear_swish' #: 2048
layer layer_norm 'conformer_9_ffmod_1_ln' #: 512
layer copy 'conformer_9_ffmod_2_dropout' #: 512
layer linear 'conformer_9_ffmod_2_dropout_linear' #: 512
layer eval 'conformer_9_ffmod_2_half_res_add' #: 512
layer linear 'conformer_9_ffmod_2_linear_swish' #: 2048
layer layer_norm 'conformer_9_ffmod_2_ln' #: 512
layer linear 'conformer_9_mhsa_mod_att_linear' #: 512
layer copy 'conformer_9_mhsa_mod_dropout' #: 512
layer layer_norm 'conformer_9_mhsa_mod_ln' #: 512
layer relative_positional_encoding 'conformer_9_mhsa_mod_relpos_encoding' #: 64
layer combine 'conformer_9_mhsa_mod_res_add' #: 512
layer self_attention 'conformer_9_mhsa_mod_self_attention' #: 512
layer layer_norm 'conformer_9_output' #: 512
layer conv 'conv_1' #: 32
layer pool 'conv_1_pool' #: 32
layer conv 'conv_2' #: 64
layer conv 'conv_3' #: 64
layer merge_dims 'conv_merged' #: 2560
layer split_dims 'conv_source' #: 1
layer source 'data' #: 1
layer copy 'encoder' #: 512
layer subnetwork 'features' #: 80
layer activation 'features/abs' #: 129
layer activation 'features/log' #: 80
layer eval 'features/log10' #: 80
layer mel_filterbank 'features/mel_filterbank' #: 80
layer batch_norm 'features/output' #: 80
layer eval 'features/power' #: 129
layer stft 'features/stft' #: 129
layer norm 'features/wave_norm' #: 1
layer copy 'input_dropout' #: 512
layer linear 'input_linear' #: 512
layer softmax 'output' #: 88
layer eval 'specaug' #: 80
net params #: 74182234
net trainable params: [<tf.Variable 'conformer_10_conv_mod_bn/batch_norm/conformer_10_conv_mod_bn_conformer_10_conv_mod_bn_output_beta:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_10_conv_mod_bn/batch_norm/conformer_10_conv_mod_bn_conformer_10_conv_mod_bn_output_gamma:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_10_conv_mod_depthwise_conv/W:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'conformer_10_conv_mod_depthwise_conv/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_10_conv_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_10_conv_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_10_conv_mod_pointwise_conv_1/W:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'conformer_10_conv_mod_pointwise_conv_1/b:0' shape=(1024,) dtype=float32>, <tf.Variable 'conformer_10_conv_mod_pointwise_conv_2/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_10_conv_mod_pointwise_conv_2/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_10_ffmod_1_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_10_ffmod_1_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_10_ffmod_1_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_10_ffmod_1_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_10_ffmod_1_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_10_ffmod_1_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_10_ffmod_2_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_10_ffmod_2_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_10_ffmod_2_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_10_ffmod_2_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_10_ffmod_2_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_10_ffmod_2_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_10_mhsa_mod_att_linear/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_10_mhsa_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_10_mhsa_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_10_mhsa_mod_relpos_encoding/encoding_matrix:0' shape=(65, 64) dtype=float32>, <tf.Variable 'conformer_10_mhsa_mod_self_attention/QKV:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'conformer_10_output/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_10_output/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_11_conv_mod_bn/batch_norm/conformer_11_conv_mod_bn_conformer_11_conv_mod_bn_output_beta:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_11_conv_mod_bn/batch_norm/conformer_11_conv_mod_bn_conformer_11_conv_mod_bn_output_gamma:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_11_conv_mod_depthwise_conv/W:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'conformer_11_conv_mod_depthwise_conv/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_11_conv_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_11_conv_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_11_conv_mod_pointwise_conv_1/W:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'conformer_11_conv_mod_pointwise_conv_1/b:0' shape=(1024,) dtype=float32>, <tf.Variable 'conformer_11_conv_mod_pointwise_conv_2/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_11_conv_mod_pointwise_conv_2/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_11_ffmod_1_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_11_ffmod_1_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_11_ffmod_1_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_11_ffmod_1_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_11_ffmod_1_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_11_ffmod_1_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_11_ffmod_2_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_11_ffmod_2_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_11_ffmod_2_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_11_ffmod_2_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_11_ffmod_2_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_11_ffmod_2_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_11_mhsa_mod_att_linear/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_11_mhsa_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_11_mhsa_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_11_mhsa_mod_relpos_encoding/encoding_matrix:0' shape=(65, 64) dtype=float32>, <tf.Variable 'conformer_11_mhsa_mod_self_attention/QKV:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'conformer_11_output/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_11_output/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_12_conv_mod_bn/batch_norm/conformer_12_conv_mod_bn_conformer_12_conv_mod_bn_output_beta:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_12_conv_mod_bn/batch_norm/conformer_12_conv_mod_bn_conformer_12_conv_mod_bn_output_gamma:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_12_conv_mod_depthwise_conv/W:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'conformer_12_conv_mod_depthwise_conv/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_12_conv_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_12_conv_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_12_conv_mod_pointwise_conv_1/W:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'conformer_12_conv_mod_pointwise_conv_1/b:0' shape=(1024,) dtype=float32>, <tf.Variable 'conformer_12_conv_mod_pointwise_conv_2/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_12_conv_mod_pointwise_conv_2/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_12_ffmod_1_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_12_ffmod_1_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_12_ffmod_1_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_12_ffmod_1_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_12_ffmod_1_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_12_ffmod_1_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_12_ffmod_2_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_12_ffmod_2_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_12_ffmod_2_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_12_ffmod_2_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_12_ffmod_2_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_12_ffmod_2_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_12_mhsa_mod_att_linear/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_12_mhsa_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_12_mhsa_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_12_mhsa_mod_relpos_encoding/encoding_matrix:0' shape=(65, 64) dtype=float32>, <tf.Variable 'conformer_12_mhsa_mod_self_attention/QKV:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'conformer_12_output/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_12_output/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_bn/batch_norm/conformer_1_conv_mod_bn_conformer_1_conv_mod_bn_output_beta:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_bn/batch_norm/conformer_1_conv_mod_bn_conformer_1_conv_mod_bn_output_gamma:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_depthwise_conv/W:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_depthwise_conv/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_1/W:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_1/b:0' shape=(1024,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_2/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_2/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_att_linear/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_relpos_encoding/encoding_matrix:0' shape=(65, 64) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_self_attention/QKV:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'conformer_1_output/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_output/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_2_conv_mod_bn/batch_norm/conformer_2_conv_mod_bn_conformer_2_conv_mod_bn_output_beta:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_2_conv_mod_bn/batch_norm/conformer_2_conv_mod_bn_conformer_2_conv_mod_bn_output_gamma:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_2_conv_mod_depthwise_conv/W:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'conformer_2_conv_mod_depthwise_conv/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_2_conv_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_2_conv_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_2_conv_mod_pointwise_conv_1/W:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'conformer_2_conv_mod_pointwise_conv_1/b:0' shape=(1024,) dtype=float32>, <tf.Variable 'conformer_2_conv_mod_pointwise_conv_2/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_2_conv_mod_pointwise_conv_2/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_2_ffmod_1_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_2_ffmod_1_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_2_ffmod_1_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_2_ffmod_1_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_2_ffmod_1_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_2_ffmod_1_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_2_ffmod_2_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_2_ffmod_2_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_2_ffmod_2_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_2_ffmod_2_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_2_ffmod_2_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_2_ffmod_2_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_2_mhsa_mod_att_linear/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_2_mhsa_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_2_mhsa_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_2_mhsa_mod_relpos_encoding/encoding_matrix:0' shape=(65, 64) dtype=float32>, <tf.Variable 'conformer_2_mhsa_mod_self_attention/QKV:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'conformer_2_output/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_2_output/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_3_conv_mod_bn/batch_norm/conformer_3_conv_mod_bn_conformer_3_conv_mod_bn_output_beta:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_3_conv_mod_bn/batch_norm/conformer_3_conv_mod_bn_conformer_3_conv_mod_bn_output_gamma:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_3_conv_mod_depthwise_conv/W:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'conformer_3_conv_mod_depthwise_conv/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_3_conv_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_3_conv_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_3_conv_mod_pointwise_conv_1/W:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'conformer_3_conv_mod_pointwise_conv_1/b:0' shape=(1024,) dtype=float32>, <tf.Variable 'conformer_3_conv_mod_pointwise_conv_2/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_3_conv_mod_pointwise_conv_2/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_3_ffmod_1_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_3_ffmod_1_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_3_ffmod_1_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_3_ffmod_1_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_3_ffmod_1_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_3_ffmod_1_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_3_ffmod_2_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_3_ffmod_2_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_3_ffmod_2_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_3_ffmod_2_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_3_ffmod_2_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_3_ffmod_2_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_3_mhsa_mod_att_linear/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_3_mhsa_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_3_mhsa_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_3_mhsa_mod_relpos_encoding/encoding_matrix:0' shape=(65, 64) dtype=float32>, <tf.Variable 'conformer_3_mhsa_mod_self_attention/QKV:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'conformer_3_output/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_3_output/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_4_conv_mod_bn/batch_norm/conformer_4_conv_mod_bn_conformer_4_conv_mod_bn_output_beta:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_4_conv_mod_bn/batch_norm/conformer_4_conv_mod_bn_conformer_4_conv_mod_bn_output_gamma:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_4_conv_mod_depthwise_conv/W:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'conformer_4_conv_mod_depthwise_conv/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_4_conv_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_4_conv_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_4_conv_mod_pointwise_conv_1/W:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'conformer_4_conv_mod_pointwise_conv_1/b:0' shape=(1024,) dtype=float32>, <tf.Variable 'conformer_4_conv_mod_pointwise_conv_2/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_4_conv_mod_pointwise_conv_2/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_4_ffmod_1_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_4_ffmod_1_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_4_ffmod_1_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_4_ffmod_1_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_4_ffmod_1_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_4_ffmod_1_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_4_ffmod_2_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_4_ffmod_2_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_4_ffmod_2_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_4_ffmod_2_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_4_ffmod_2_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_4_ffmod_2_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_4_mhsa_mod_att_linear/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_4_mhsa_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_4_mhsa_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_4_mhsa_mod_relpos_encoding/encoding_matrix:0' shape=(65, 64) dtype=float32>, <tf.Variable 'conformer_4_mhsa_mod_self_attention/QKV:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'conformer_4_output/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_4_output/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_5_conv_mod_bn/batch_norm/conformer_5_conv_mod_bn_conformer_5_conv_mod_bn_output_beta:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_5_conv_mod_bn/batch_norm/conformer_5_conv_mod_bn_conformer_5_conv_mod_bn_output_gamma:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_5_conv_mod_depthwise_conv/W:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'conformer_5_conv_mod_depthwise_conv/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_5_conv_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_5_conv_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_5_conv_mod_pointwise_conv_1/W:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'conformer_5_conv_mod_pointwise_conv_1/b:0' shape=(1024,) dtype=float32>, <tf.Variable 'conformer_5_conv_mod_pointwise_conv_2/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_5_conv_mod_pointwise_conv_2/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_5_ffmod_1_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_5_ffmod_1_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_5_ffmod_1_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_5_ffmod_1_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_5_ffmod_1_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_5_ffmod_1_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_5_ffmod_2_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_5_ffmod_2_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_5_ffmod_2_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_5_ffmod_2_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_5_ffmod_2_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_5_ffmod_2_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_5_mhsa_mod_att_linear/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_5_mhsa_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_5_mhsa_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_5_mhsa_mod_relpos_encoding/encoding_matrix:0' shape=(65, 64) dtype=float32>, <tf.Variable 'conformer_5_mhsa_mod_self_attention/QKV:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'conformer_5_output/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_5_output/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_6_conv_mod_bn/batch_norm/conformer_6_conv_mod_bn_conformer_6_conv_mod_bn_output_beta:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_6_conv_mod_bn/batch_norm/conformer_6_conv_mod_bn_conformer_6_conv_mod_bn_output_gamma:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_6_conv_mod_depthwise_conv/W:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'conformer_6_conv_mod_depthwise_conv/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_6_conv_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_6_conv_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_6_conv_mod_pointwise_conv_1/W:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'conformer_6_conv_mod_pointwise_conv_1/b:0' shape=(1024,) dtype=float32>, <tf.Variable 'conformer_6_conv_mod_pointwise_conv_2/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_6_conv_mod_pointwise_conv_2/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_6_ffmod_1_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_6_ffmod_1_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_6_ffmod_1_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_6_ffmod_1_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_6_ffmod_1_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_6_ffmod_1_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_6_ffmod_2_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_6_ffmod_2_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_6_ffmod_2_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_6_ffmod_2_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_6_ffmod_2_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_6_ffmod_2_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_6_mhsa_mod_att_linear/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_6_mhsa_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_6_mhsa_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_6_mhsa_mod_relpos_encoding/encoding_matrix:0' shape=(65, 64) dtype=float32>, <tf.Variable 'conformer_6_mhsa_mod_self_attention/QKV:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'conformer_6_output/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_6_output/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_7_conv_mod_bn/batch_norm/conformer_7_conv_mod_bn_conformer_7_conv_mod_bn_output_beta:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_7_conv_mod_bn/batch_norm/conformer_7_conv_mod_bn_conformer_7_conv_mod_bn_output_gamma:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_7_conv_mod_depthwise_conv/W:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'conformer_7_conv_mod_depthwise_conv/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_7_conv_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_7_conv_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_7_conv_mod_pointwise_conv_1/W:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'conformer_7_conv_mod_pointwise_conv_1/b:0' shape=(1024,) dtype=float32>, <tf.Variable 'conformer_7_conv_mod_pointwise_conv_2/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_7_conv_mod_pointwise_conv_2/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_7_ffmod_1_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_7_ffmod_1_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_7_ffmod_1_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_7_ffmod_1_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_7_ffmod_1_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_7_ffmod_1_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_7_ffmod_2_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_7_ffmod_2_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_7_ffmod_2_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_7_ffmod_2_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_7_ffmod_2_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_7_ffmod_2_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_7_mhsa_mod_att_linear/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_7_mhsa_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_7_mhsa_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_7_mhsa_mod_relpos_encoding/encoding_matrix:0' shape=(65, 64) dtype=float32>, <tf.Variable 'conformer_7_mhsa_mod_self_attention/QKV:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'conformer_7_output/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_7_output/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_8_conv_mod_bn/batch_norm/conformer_8_conv_mod_bn_conformer_8_conv_mod_bn_output_beta:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_8_conv_mod_bn/batch_norm/conformer_8_conv_mod_bn_conformer_8_conv_mod_bn_output_gamma:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_8_conv_mod_depthwise_conv/W:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'conformer_8_conv_mod_depthwise_conv/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_8_conv_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_8_conv_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_8_conv_mod_pointwise_conv_1/W:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'conformer_8_conv_mod_pointwise_conv_1/b:0' shape=(1024,) dtype=float32>, <tf.Variable 'conformer_8_conv_mod_pointwise_conv_2/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_8_conv_mod_pointwise_conv_2/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_8_ffmod_1_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_8_ffmod_1_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_8_ffmod_1_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_8_ffmod_1_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_8_ffmod_1_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_8_ffmod_1_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_8_ffmod_2_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_8_ffmod_2_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_8_ffmod_2_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_8_ffmod_2_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_8_ffmod_2_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_8_ffmod_2_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_8_mhsa_mod_att_linear/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_8_mhsa_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_8_mhsa_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_8_mhsa_mod_relpos_encoding/encoding_matrix:0' shape=(65, 64) dtype=float32>, <tf.Variable 'conformer_8_mhsa_mod_self_attention/QKV:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'conformer_8_output/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_8_output/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_9_conv_mod_bn/batch_norm/conformer_9_conv_mod_bn_conformer_9_conv_mod_bn_output_beta:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_9_conv_mod_bn/batch_norm/conformer_9_conv_mod_bn_conformer_9_conv_mod_bn_output_gamma:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_9_conv_mod_depthwise_conv/W:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'conformer_9_conv_mod_depthwise_conv/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_9_conv_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_9_conv_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_9_conv_mod_pointwise_conv_1/W:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'conformer_9_conv_mod_pointwise_conv_1/b:0' shape=(1024,) dtype=float32>, <tf.Variable 'conformer_9_conv_mod_pointwise_conv_2/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_9_conv_mod_pointwise_conv_2/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_9_ffmod_1_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_9_ffmod_1_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_9_ffmod_1_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_9_ffmod_1_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_9_ffmod_1_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_9_ffmod_1_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_9_ffmod_2_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_9_ffmod_2_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_9_ffmod_2_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_9_ffmod_2_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_9_ffmod_2_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_9_ffmod_2_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_9_mhsa_mod_att_linear/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_9_mhsa_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_9_mhsa_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_9_mhsa_mod_relpos_encoding/encoding_matrix:0' shape=(65, 64) dtype=float32>, <tf.Variable 'conformer_9_mhsa_mod_self_attention/QKV:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'conformer_9_output/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_9_output/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conv_1/W:0' shape=(3, 3, 1, 32) dtype=float32>, <tf.Variable 'conv_1/bias:0' shape=(32,) dtype=float32>, <tf.Variable 'conv_2/W:0' shape=(3, 3, 32, 64) dtype=float32>, <tf.Variable 'conv_2/bias:0' shape=(64,) dtype=float32>, <tf.Variable 'conv_3/W:0' shape=(3, 3, 64, 64) dtype=float32>, <tf.Variable 'conv_3/bias:0' shape=(64,) dtype=float32>, <tf.Variable 'features/output/batch_norm/output_output_output_beta:0' shape=(1, 1, 80) dtype=float32>, <tf.Variable 'features/output/batch_norm/output_output_output_gamma:0' shape=(1, 1, 80) dtype=float32>, <tf.Variable 'features/wave_norm/bias:0' shape=(1,) dtype=float32>, <tf.Variable 'features/wave_norm/scale:0' shape=(1,) dtype=float32>, <tf.Variable 'input_linear/W:0' shape=(2560, 512) dtype=float32>, <tf.Variable 'output/W:0' shape=(512, 88) dtype=float32>, <tf.Variable 'output/b:0' shape=(88,) dtype=float32>]
loading weights from /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.390
[2023-10-11 21:39:40,225] INFO: Run time: 0:00:45 CPU: 0.40% RSS: 2.05GB VMS: 24.15GB
[2023-10-11 21:39:45,251] INFO: Run time: 0:00:50 CPU: 0.20% RSS: 2.35GB VMS: 24.41GB
start training at epoch 391
using batch size: {'classes': 5000, 'data': 400000}, max seqs: 128
learning rate control: NewbobMultiEpoch(num_epochs=6, update_interval=1, relative_error_threshold=-0.01, relative_error_grow_threshold=-0.01), epoch data: 1: EpochData(learningRate=1.325e-05, error={
'dev_score': 1.3712836585087085,
'devtrain_score': 1.3461227879494617,
'train_score': 1.3859369810006912,
}), 2: EpochData(learningRate=1.539861111111111e-05, error={
'dev_score': 1.2479231324111937,
'devtrain_score': 1.217953189493425,
'train_score': 1.2705600519813665,
}), 3: EpochData(learningRate=1.754722222222222e-05, error={
'dev_score': 1.191105520505297,
'devtrain_score': 1.152482158154539,
'train_score': 1.2071181430771236,
}), ..., 388: EpochData(learningRate=1e-05, error={
'dev_score': 0.08985679090525922,
'devtrain_score': 0.026802025918303496,
'train_score': 0.08481302595337009,
}), 389: EpochData(learningRate=1e-05, error={
'dev_score': 0.09035191899876244,
'devtrain_score': 0.026659155496610833,
'train_score': 0.08407203565283547,
}), 390: EpochData(learningRate=1e-05, error={
'dev_score': 0.09086543904481646,
'devtrain_score': 0.026506946784649415,
'train_score': 0.0845128287906251,
}), error key: dev_score
pretrain: None
[2023-10-11 21:39:50,273] INFO: Run time: 0:00:55 CPU: 0.60% RSS: 2.91GB VMS: 24.96GB
start epoch 391 with learning rate 1e-05 ...
Create optimizer <class 'returnn.tf.updater.NadamOptimizer'> with options {'epsilon': 1e-08, 'learning_rate': <tf.Variable 'learning_rate:0' shape=() dtype=float32>}.
Initialize optimizer (default) with slots ['m', 'v'].
These additional variable were created by the optimizer: [<tf.Variable 'optimize/beta1_power:0' shape=() dtype=float32>, <tf.Variable 'optimize/beta2_power:0' shape=() dtype=float32>].
[2023-10-11 21:40:20,386] INFO: Run time: 0:01:25 CPU: 0.20% RSS: 3.34GB VMS: 25.45GB
/work/asr4/vieting/programs/rasr/20230707/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard: Relink `/usr/local/lib/tensorflow/libtensorflow_framework.so.2' with `/lib/x86_64-linux-gnu/libz.so.1' for IFUNC symbol `crc32_z'
[2023-10-11 21:40:50,498] INFO: Run time: 0:01:55 CPU: 0.40% RSS: 3.77GB VMS: 26.64GB
configuration error: failed to open file "neural-network-trainer.config" for reading. (No such file or directory)
/work/asr4/vieting/programs/rasr/20230707/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard: Relink `/usr/local/lib/tensorflow/libtensorflow_framework.so.2' with `/lib/x86_64-linux-gnu/libz.so.1' for IFUNC symbol `crc32_z'
configuration error: failed to open file "neural-network-trainer.config" for reading. (No such file or directory)
2023-10-11 21:40:59.767405: I tensorflow/stream_executor/cuda/cuda_dnn.cc:379] Loaded cuDNN version 8400
[2023-10-11 21:41:00,546] INFO: Run time: 0:02:05 CPU: 0.60% RSS: 4.51GB VMS: 29.10GB
2023-10-11 21:41:03.973206: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
[2023-10-11 21:41:10,596] INFO: Run time: 0:02:15 CPU: 0.60% RSS: 5.12GB VMS: 29.71GB
[2023-10-11 21:44:06,421] INFO: Run time: 0:05:11 CPU: 0.40% RSS: 5.64GB VMS: 30.46GB
[2023-10-11 21:48:42,650] INFO: Run time: 0:09:47 CPU: 0.40% RSS: 6.20GB VMS: 31.22GB
[2023-10-11 22:02:00,986] INFO: Run time: 0:23:05 CPU: 0.40% RSS: 6.82GB VMS: 31.99GB
Stats:
mem_usage:GPU:0: Stats(mean=4.3GB, std_dev=166.6MB, min=388.8MB, max=4.4GB, num_seqs=4178, avg_data_len=1)
train epoch 391, finished after 4178 steps, 0:22:30 elapsed (97.4% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.391
epoch 391 score: 0.08400366136566557 error: None elapsed: 0:22:30
Stats:
mem_usage:GPU:0: Stats(mean=4.4GB, std_dev=0.0B, min=4.4GB, max=4.4GB, num_seqs=31, avg_data_len=1)
epoch 391 'dev' eval, finished after 31 steps, 0:00:08 elapsed (81.0% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=4.4GB, std_dev=0.0B, min=4.4GB, max=4.4GB, num_seqs=29, avg_data_len=1)
epoch 391 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (54.5% computing time)
dev: score 0.08993440706559858 error None devtrain: score 0.026880376000358423 error None
15 epochs stored so far and keeping all.
start epoch 392 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=4.6GB, std_dev=57.8MB, min=4.4GB, max=4.6GB, num_seqs=4180, avg_data_len=1)
train epoch 392, finished after 4180 steps, 0:19:16 elapsed (99.4% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.392
epoch 392 score: 0.08421701323548142 error: None elapsed: 0:19:16
Stats:
mem_usage:GPU:0: Stats(mean=4.6GB, std_dev=0.0B, min=4.6GB, max=4.6GB, num_seqs=31, avg_data_len=1)
epoch 392 'dev' eval, finished after 31 steps, 0:00:04 elapsed (60.0% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=4.6GB, std_dev=0.0B, min=4.6GB, max=4.6GB, num_seqs=29, avg_data_len=1)
epoch 392 'devtrain' eval, finished after 29 steps, 0:00:03 elapsed (55.4% computing time)
dev: score 0.09141929587738695 error None devtrain: score 0.02739781722813435 error None
16 epochs stored so far and keeping all.
start epoch 393 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=4.7GB, std_dev=10.7MB, min=4.6GB, max=4.7GB, num_seqs=4182, avg_data_len=1)
train epoch 393, finished after 4182 steps, 0:19:11 elapsed (99.4% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.393
epoch 393 score: 0.08388911349128969 error: None elapsed: 0:19:11
Stats:
mem_usage:GPU:0: Stats(mean=4.7GB, std_dev=0.0B, min=4.7GB, max=4.7GB, num_seqs=31, avg_data_len=1)
epoch 393 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.0% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=4.7GB, std_dev=0.0B, min=4.7GB, max=4.7GB, num_seqs=29, avg_data_len=1)
epoch 393 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (56.2% computing time)
dev: score 0.09072998857164565 error None devtrain: score 0.026826840835708805 error None
We have stored models for epochs [359, 364, 366, ..., 391, 392, 393] and keep epochs [359, 364, 366, 370, 377, 378, 379, 383, 386, 387, 389, 390, 391, 392, 393].
We will delete the models of epochs [373, 388].
Deleted 587.4MB.
start epoch 394 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.2GB, std_dev=373.1MB, min=4.7GB, max=5.5GB, num_seqs=4168, avg_data_len=1)
train epoch 394, finished after 4168 steps, 0:19:07 elapsed (99.4% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.394
epoch 394 score: 0.08400428946445164 error: None elapsed: 0:19:07
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 394 'dev' eval, finished after 31 steps, 0:00:04 elapsed (59.2% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 394 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.0% computing time)
dev: score 0.09077804731220876 error None devtrain: score 0.025894683877561105 error None
We have stored models for epochs [359, 364, 366, ..., 392, 393, 394] and keep epochs [359, 364, 366, 370, 377, 378, 379, 383, 386, 387, 390, 391, 392, 393, 394].
We will delete the models of epochs [389].
Deleted 293.7MB.
start epoch 395 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4177, avg_data_len=1)
train epoch 395, finished after 4177 steps, 0:19:12 elapsed (99.4% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.395
[2023-10-11 23:20:25,640] INFO: Run time: 1:41:30 CPU: 0.40% RSS: 7.56GB VMS: 32.86GB
epoch 395 score: 0.08496213777813798 error: None elapsed: 0:19:12
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 395 'dev' eval, finished after 31 steps, 0:00:04 elapsed (57.8% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 395 'devtrain' eval, finished after 29 steps, 0:00:03 elapsed (56.1% computing time)
dev: score 0.09076227162597035 error None devtrain: score 0.025947693735172528 error None
16 epochs stored so far and keeping all.
start epoch 396 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4234, avg_data_len=1)
train epoch 396, finished after 4234 steps, 0:19:27 elapsed (99.4% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.396
epoch 396 score: 0.08434961924249695 error: None elapsed: 0:19:27
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 396 'dev' eval, finished after 31 steps, 0:00:04 elapsed (57.5% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 396 'devtrain' eval, finished after 29 steps, 0:00:03 elapsed (56.1% computing time)
dev: score 0.0904330411207085 error None devtrain: score 0.025514320532723372 error None
We have stored models for epochs [359, 364, 366, ..., 394, 395, 396] and keep epochs [359, 364, 366, 370, 377, 378, 379, 383, 386, 387, 391, 392, 393, 394, 395, 396].
We will delete the models of epochs [390].
Deleted 293.7MB.
start epoch 397 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4181, avg_data_len=1)
train epoch 397, finished after 4181 steps, 0:19:08 elapsed (99.4% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.397
epoch 397 score: 0.08391337878236499 error: None elapsed: 0:19:08
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 397 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.4% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 397 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.9% computing time)
dev: score 0.09013096193402975 error None devtrain: score 0.026373680437931465 error None
We have stored models for epochs [359, 364, 366, ..., 395, 396, 397] and keep epochs [359, 364, 366, 370, 377, 378, 379, 386, 387, 393, 394, 395, 396, 397].
We will delete the models of epochs [383, 391, 392].
Deleted 0.9GB.
start epoch 398 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4176, avg_data_len=1)
train epoch 398, finished after 4176 steps, 0:19:12 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.398
epoch 398 score: 0.08265076995881551 error: None elapsed: 0:19:12
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 398 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.9% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 398 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (56.0% computing time)
dev: score 0.09101703816939097 error None devtrain: score 0.02660280731553671 error None
15 epochs stored so far and keeping all.
start epoch 399 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4193, avg_data_len=1)
train epoch 399, finished after 4193 steps, 0:19:43 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.399
epoch 399 score: 0.08235212145190463 error: None elapsed: 0:19:43
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 399 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.5% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 399 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (56.4% computing time)
dev: score 0.09018240329777154 error None devtrain: score 0.026346119970342592 error None
We have stored models for epochs [359, 364, 366, ..., 397, 398, 399] and keep epochs [359, 364, 366, 370, 378, 379, 386, 387, 394, 395, 396, 397, 398, 399].
We will delete the models of epochs [377, 393].
Deleted 587.4MB.
start epoch 400 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=261.6KB, min=5.5GB, max=5.5GB, num_seqs=4160, avg_data_len=1)
train epoch 400, finished after 4160 steps, 0:19:32 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.400
epoch 400 score: 0.08339132824813998 error: None elapsed: 0:19:32
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 400 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.4% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 400 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.3% computing time)
dev: score 0.09022391463791925 error None devtrain: score 0.025841081156994548 error None
We have stored models for epochs [359, 364, 366, ..., 398, 399, 400] and keep epochs [359, 364, 366, 370, 378, 386, 387, 394, 395, 396, 397, 398, 399, 400].
We will delete the models of epochs [379].
Deleted 293.7MB.
start epoch 401 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4191, avg_data_len=1)
train epoch 401, finished after 4191 steps, 0:19:42 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.401
epoch 401 score: 0.08321606418107716 error: None elapsed: 0:19:42
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 401 'dev' eval, finished after 31 steps, 0:00:04 elapsed (59.0% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 401 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (56.2% computing time)
dev: score 0.09055862691293622 error None devtrain: score 0.02612697981518672 error None
We have stored models for epochs [359, 364, 366, ..., 399, 400, 401] and keep epochs [359, 364, 366, 370, 378, 387, 394, 395, 396, 397, 398, 399, 400, 401].
We will delete the models of epochs [386].
Deleted 293.7MB.
start epoch 402 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4219, avg_data_len=1)
train epoch 402, finished after 4219 steps, 0:19:50 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.402
epoch 402 score: 0.08450456713209396 error: None elapsed: 0:19:50
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 402 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.9% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 402 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.2% computing time)
dev: score 0.09092887729701023 error None devtrain: score 0.025816170981262557 error None
We have stored models for epochs [359, 364, 366, ..., 400, 401, 402] and keep epochs [359, 364, 366, 370, 378, 387, 394, 395, 396, 398, 399, 400, 401, 402].
We will delete the models of epochs [397].
Deleted 293.7MB.
start epoch 403 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4210, avg_data_len=1)
train epoch 403, finished after 4210 steps, 0:19:47 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.403
epoch 403 score: 0.08293471938524258 error: None elapsed: 0:19:47
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 403 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.1% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 403 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.6% computing time)
dev: score 0.0901411941256409 error None devtrain: score 0.026532771124956972 error None
We have stored models for epochs [359, 364, 366, ..., 401, 402, 403] and keep epochs [359, 364, 366, 370, 378, 394, 395, 396, 398, 399, 400, 401, 402, 403].
We will delete the models of epochs [387].
Deleted 293.7MB.
start epoch 404 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4196, avg_data_len=1)
train epoch 404, finished after 4196 steps, 0:19:43 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.404
epoch 404 score: 0.08249238263906251 error: None elapsed: 0:19:43
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 404 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.5% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 404 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (54.2% computing time)
dev: score 0.0899801640639724 error None devtrain: score 0.02644721011386587 error None
15 epochs stored so far and keeping all.
start epoch 405 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4186, avg_data_len=1)
train epoch 405, finished after 4186 steps, 0:19:40 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.405
epoch 405 score: 0.08245617300247964 error: None elapsed: 0:19:40
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 405 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.9% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 405 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.6% computing time)
dev: score 0.09077911388905734 error None devtrain: score 0.026696995568451804 error None
16 epochs stored so far and keeping all.
start epoch 406 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4167, avg_data_len=1)
train epoch 406, finished after 4167 steps, 0:19:31 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.406
epoch 406 score: 0.08399680986622006 error: None elapsed: 0:19:31
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 406 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.7% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 406 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (54.8% computing time)
dev: score 0.08966737057491372 error None devtrain: score 0.026257957092432414 error None
We have stored models for epochs [359, 364, 366, ..., 404, 405, 406] and keep epochs [359, 364, 366, 378, 394, 395, 396, 398, 399, 400, 402, 403, 404, 405, 406].
We will delete the models of epochs [370, 401].
Deleted 587.4MB.
start epoch 407 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4159, avg_data_len=1)
train epoch 407, finished after 4159 steps, 0:19:33 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.407
epoch 407 score: 0.08353985838337341 error: None elapsed: 0:19:33
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 407 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.8% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 407 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.6% computing time)
dev: score 0.09027761015614455 error None devtrain: score 0.02561776016492116 error None
We have stored models for epochs [359, 364, 366, ..., 405, 406, 407] and keep epochs [359, 364, 366, 378, 394, 396, 398, 399, 400, 402, 403, 404, 405, 406, 407].
We will delete the models of epochs [395].
Deleted 293.7MB.
start epoch 408 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4200, avg_data_len=1)
train epoch 408, finished after 4200 steps, 0:19:46 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.408
epoch 408 score: 0.08314991566424393 error: None elapsed: 0:19:46
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 408 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.6% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 408 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (54.8% computing time)
dev: score 0.09049524025583396 error None devtrain: score 0.025993217321075256 error None
16 epochs stored so far and keeping all.
start epoch 409 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4205, avg_data_len=1)
train epoch 409, finished after 4205 steps, 0:19:45 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.409
epoch 409 score: 0.08251877963602931 error: None elapsed: 0:19:45
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 409 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.5% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 409 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.1% computing time)
dev: score 0.09067290124664933 error None devtrain: score 0.026320297886880933 error None
We have stored models for epochs [359, 364, 366, ..., 407, 408, 409] and keep epochs [359, 364, 366, 378, 394, 396, 398, 399, 400, 402, 404, 405, 406, 407, 408, 409].
We will delete the models of epochs [403].
Deleted 293.7MB.
start epoch 410 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4197, avg_data_len=1)
train epoch 410, finished after 4197 steps, 0:19:43 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.410
[2023-10-12 04:18:38,364] INFO: Run time: 6:39:43 CPU: 0.60% RSS: 8.37GB VMS: 33.83GB
epoch 410 score: 0.08337167311160373 error: None elapsed: 0:19:43
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 410 'dev' eval, finished after 31 steps, 0:00:04 elapsed (57.8% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 410 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.3% computing time)
dev: score 0.09183007896461402 error None devtrain: score 0.02651194302297897 error None
17 epochs stored so far and keeping all.
start epoch 411 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4171, avg_data_len=1)
train epoch 411, finished after 4171 steps, 0:19:33 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.411
epoch 411 score: 0.08249471195955657 error: None elapsed: 0:19:33
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 411 'dev' eval, finished after 31 steps, 0:00:04 elapsed (57.8% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 411 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.4% computing time)
dev: score 0.09186992857547137 error None devtrain: score 0.02616978679383357 error None
We have stored models for epochs [359, 364, 366, ..., 409, 410, 411] and keep epochs [359, 364, 366, 378, 394, 396, 399, 400, 402, 404, 405, 406, 407, 408, 409, 410, 411].
We will delete the models of epochs [398].
Deleted 293.7MB.
start epoch 412 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4172, avg_data_len=1)
train epoch 412, finished after 4172 steps, 0:19:36 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.412
epoch 412 score: 0.08319460719168133 error: None elapsed: 0:19:36
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 412 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.5% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 412 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.6% computing time)
dev: score 0.09019979357085962 error None devtrain: score 0.0257979828135639 error None
We have stored models for epochs [359, 364, 366, ..., 410, 411, 412] and keep epochs [359, 364, 366, 378, 396, 399, 400, 402, 404, 405, 406, 407, 408, 409, 410, 411, 412].
We will delete the models of epochs [394].
Deleted 293.7MB.
start epoch 413 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4197, avg_data_len=1)
train epoch 413, finished after 4197 steps, 0:19:43 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.413
epoch 413 score: 0.08306754462751167 error: None elapsed: 0:19:43
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 413 'dev' eval, finished after 31 steps, 0:00:04 elapsed (59.3% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 413 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.4% computing time)
dev: score 0.09074148754241838 error None devtrain: score 0.025274968671020553 error None
We have stored models for epochs [359, 364, 366, ..., 411, 412, 413] and keep epochs [359, 364, 366, 378, 396, 399, 402, 404, 405, 406, 407, 409, 410, 411, 412, 413].
We will delete the models of epochs [400, 408].
Deleted 587.4MB.
start epoch 414 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4172, avg_data_len=1)
train epoch 414, finished after 4172 steps, 0:19:35 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.414
epoch 414 score: 0.0836884931439581 error: None elapsed: 0:19:35
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 414 'dev' eval, finished after 31 steps, 0:00:04 elapsed (59.0% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 414 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (54.8% computing time)
dev: score 0.09173472828561914 error None devtrain: score 0.02538882494733823 error None
We have stored models for epochs [359, 364, 366, ..., 412, 413, 414] and keep epochs [359, 364, 366, 378, 396, 399, 404, 405, 406, 407, 409, 410, 411, 412, 413, 414].
We will delete the models of epochs [402].
Deleted 293.7MB.
start epoch 415 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4183, avg_data_len=1)
train epoch 415, finished after 4183 steps, 0:19:40 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.415
epoch 415 score: 0.08181178922788339 error: None elapsed: 0:19:40
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 415 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.6% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 415 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.1% computing time)
dev: score 0.09110214819673813 error None devtrain: score 0.026366395757861632 error None
We have stored models for epochs [359, 364, 366, ..., 413, 414, 415] and keep epochs [359, 364, 366, 378, 396, 399, 404, 405, 406, 407, 411, 412, 413, 414, 415].
We will delete the models of epochs [409, 410].
Deleted 587.4MB.
start epoch 416 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4174, avg_data_len=1)
train epoch 416, finished after 4174 steps, 0:19:36 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.416
epoch 416 score: 0.0827154119004278 error: None elapsed: 0:19:36
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 416 'dev' eval, finished after 31 steps, 0:00:04 elapsed (57.1% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 416 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (56.3% computing time)
dev: score 0.09127620120458463 error None devtrain: score 0.025958318614427975 error None
16 epochs stored so far and keeping all.
start epoch 417 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4194, avg_data_len=1)
train epoch 417, finished after 4194 steps, 0:19:42 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.417
epoch 417 score: 0.08388887599570766 error: None elapsed: 0:19:42
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 417 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.5% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 417 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.0% computing time)
dev: score 0.09071111033396588 error None devtrain: score 0.025879979203816565 error None
17 epochs stored so far and keeping all.
start epoch 418 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4173, avg_data_len=1)
train epoch 418, finished after 4173 steps, 0:19:34 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.418
epoch 418 score: 0.08209063794468534 error: None elapsed: 0:19:34
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 418 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.2% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 418 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.3% computing time)
dev: score 0.09147636589270317 error None devtrain: score 0.025095803548978883 error None
We have stored models for epochs [359, 364, 366, ..., 416, 417, 418] and keep epochs [359, 364, 366, 378, 396, 399, 404, 405, 406, 407, 413, 414, 415, 416, 417, 418].
We will delete the models of epochs [411, 412].
Deleted 587.4MB.
start epoch 419 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4194, avg_data_len=1)
train epoch 419, finished after 4194 steps, 0:19:42 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.419
epoch 419 score: 0.0828518131020664 error: None elapsed: 0:19:42
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 419 'dev' eval, finished after 31 steps, 0:00:04 elapsed (57.8% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 419 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.8% computing time)
dev: score 0.09164404081218212 error None devtrain: score 0.02526741039696012 error None
We have stored models for epochs [359, 364, 366, ..., 417, 418, 419] and keep epochs [359, 364, 366, 378, 396, 399, 404, 405, 406, 413, 414, 415, 416, 417, 418, 419].
We will delete the models of epochs [407].
Deleted 293.7MB.
start epoch 420 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4193, avg_data_len=1)
train epoch 420, finished after 4193 steps, 0:19:42 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.420
epoch 420 score: 0.08323626809524225 error: None elapsed: 0:19:42
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 420 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.5% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 420 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.3% computing time)
dev: score 0.09096238227003942 error None devtrain: score 0.0251657299686666 error None
We have stored models for epochs [359, 364, 366, ..., 418, 419, 420] and keep epochs [359, 364, 366, 378, 399, 404, 405, 406, 413, 414, 415, 416, 417, 418, 419, 420].
We will delete the models of epochs [396].
Deleted 293.7MB.
start epoch 421 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4208, avg_data_len=1)
train epoch 421, finished after 4208 steps, 0:19:45 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.421
epoch 421 score: 0.08222371817884308 error: None elapsed: 0:19:45
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 421 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.3% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 421 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.4% computing time)
dev: score 0.09096776702766454 error None devtrain: score 0.02527897756168886 error None
We have stored models for epochs [359, 364, 366, ..., 419, 420, 421] and keep epochs [359, 364, 366, 378, 399, 405, 406, 413, 415, 417, 418, 419, 420, 421].
We will delete the models of epochs [404, 414, 416].
Deleted 0.9GB.
start epoch 422 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4144, avg_data_len=1)
train epoch 422, finished after 4144 steps, 0:19:25 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.422
epoch 422 score: 0.08255526070388984 error: None elapsed: 0:19:25
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 422 'dev' eval, finished after 31 steps, 0:00:04 elapsed (59.8% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 422 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.5% computing time)
dev: score 0.09177574777482436 error None devtrain: score 0.025661581341717046 error None
We have stored models for epochs [359, 364, 366, ..., 420, 421, 422] and keep epochs [359, 364, 366, 378, 399, 405, 406, 413, 415, 418, 419, 420, 421, 422].
We will delete the models of epochs [417].
Deleted 293.7MB.
start epoch 423 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4173, avg_data_len=1)
train epoch 423, finished after 4173 steps, 0:19:34 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.423
epoch 423 score: 0.08306380528087216 error: None elapsed: 0:19:34
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 423 'dev' eval, finished after 31 steps, 0:00:04 elapsed (59.1% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 423 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (56.0% computing time)
dev: score 0.09055247913793295 error None devtrain: score 0.02531839647658727 error None
15 epochs stored so far and keeping all.
start epoch 424 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4222, avg_data_len=1)
train epoch 424, finished after 4222 steps, 0:19:49 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.424
epoch 424 score: 0.08247824282325938 error: None elapsed: 0:19:49
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 424 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.5% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 424 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.3% computing time)
dev: score 0.09065868220714168 error None devtrain: score 0.02492753470312136 error None
16 epochs stored so far and keeping all.
start epoch 425 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4195, avg_data_len=1)
train epoch 425, finished after 4195 steps, 0:19:42 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.425
epoch 425 score: 0.08196086737207589 error: None elapsed: 0:19:42
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 425 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.6% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 425 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.0% computing time)
dev: score 0.09140785595989587 error None devtrain: score 0.02425309385319315 error None
We have stored models for epochs [359, 364, 366, ..., 423, 424, 425] and keep epochs [359, 364, 366, 378, 399, 406, 415, 418, 419, 420, 421, 422, 423, 424, 425].
We will delete the models of epochs [405, 413].
Deleted 587.4MB.
start epoch 426 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4183, avg_data_len=1)
train epoch 426, finished after 4183 steps, 0:19:38 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.426
epoch 426 score: 0.08286235069177315 error: None elapsed: 0:19:38
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 426 'dev' eval, finished after 31 steps, 0:00:04 elapsed (59.5% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 426 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.2% computing time)
dev: score 0.09141815789115065 error None devtrain: score 0.024283266888936613 error None
We have stored models for epochs [359, 364, 366, ..., 424, 425, 426] and keep epochs [359, 364, 366, 378, 399, 406, 415, 418, 420, 421, 422, 423, 424, 425, 426].
We will delete the models of epochs [419].
Deleted 293.7MB.
start epoch 427 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4157, avg_data_len=1)
train epoch 427, finished after 4157 steps, 0:19:30 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.427
epoch 427 score: 0.08310631056093992 error: None elapsed: 0:19:30
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 427 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.8% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 427 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.4% computing time)
dev: score 0.09138641472073929 error None devtrain: score 0.025806944188148745 error None
We have stored models for epochs [359, 364, 366, ..., 425, 426, 427] and keep epochs [359, 364, 366, 378, 399, 406, 415, 418, 420, 421, 423, 424, 425, 426, 427].
We will delete the models of epochs [422].
Deleted 293.7MB.
start epoch 428 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4207, avg_data_len=1)
train epoch 428, finished after 4207 steps, 0:19:46 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.428
epoch 428 score: 0.08160235774561941 error: None elapsed: 0:19:46
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 428 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.3% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 428 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.4% computing time)
dev: score 0.09227596118518427 error None devtrain: score 0.025816427641435925 error None
We have stored models for epochs [359, 364, 366, ..., 426, 427, 428] and keep epochs [359, 364, 366, 378, 406, 415, 418, 420, 421, 424, 425, 426, 427, 428].
We will delete the models of epochs [399, 423].
Deleted 587.4MB.
start epoch 429 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4188, avg_data_len=1)
train epoch 429, finished after 4188 steps, 0:19:41 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.429
epoch 429 score: 0.08170135714184865 error: None elapsed: 0:19:41
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 429 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.0% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 429 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (54.3% computing time)
dev: score 0.0912771197864512 error None devtrain: score 0.02508745336861734 error None
We have stored models for epochs [359, 364, 366, ..., 427, 428, 429] and keep epochs [359, 364, 366, 378, 406, 415, 418, 424, 425, 426, 427, 428, 429].
We will delete the models of epochs [420, 421].
Deleted 587.4MB.
start epoch 430 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4195, avg_data_len=1)
train epoch 430, finished after 4195 steps, 0:19:42 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.430
epoch 430 score: 0.08222845890154948 error: None elapsed: 0:19:42
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 430 'dev' eval, finished after 31 steps, 0:00:04 elapsed (59.5% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 430 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (56.2% computing time)
dev: score 0.09160382502645201 error None devtrain: score 0.024624127270048743 error None
14 epochs stored so far and keeping all.
start epoch 431 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4189, avg_data_len=1)
train epoch 431, finished after 4189 steps, 0:19:41 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.431
epoch 431 score: 0.08291845272060938 error: None elapsed: 0:19:41
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 431 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.2% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 431 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (56.1% computing time)
dev: score 0.08979805604379705 error None devtrain: score 0.024595885205106763 error None
15 epochs stored so far and keeping all.
start epoch 432 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4179, avg_data_len=1)
train epoch 432, finished after 4179 steps, 0:19:36 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.432
epoch 432 score: 0.08164206453017459 error: None elapsed: 0:19:36
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 432 'dev' eval, finished after 31 steps, 0:00:04 elapsed (57.7% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 432 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.6% computing time)
dev: score 0.09212745776610048 error None devtrain: score 0.02475850586067128 error None
We have stored models for epochs [359, 364, 366, ..., 430, 431, 432] and keep epochs [359, 364, 366, 378, 406, 415, 425, 426, 428, 429, 430, 431, 432].
We will delete the models of epochs [418, 424, 427].
Deleted 0.9GB.
start epoch 433 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4185, avg_data_len=1)
train epoch 433, finished after 4185 steps, 0:19:40 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.433
epoch 433 score: 0.0813635706666328 error: None elapsed: 0:19:40
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 433 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.4% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 433 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (54.2% computing time)
dev: score 0.09102702123518808 error None devtrain: score 0.02498445704970546 error None
14 epochs stored so far and keeping all.
start epoch 434 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4192, avg_data_len=1)
train epoch 434, finished after 4192 steps, 0:19:42 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.434
epoch 434 score: 0.08190310761877713 error: None elapsed: 0:19:42
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 434 'dev' eval, finished after 31 steps, 0:00:04 elapsed (57.6% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 434 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (54.8% computing time)
dev: score 0.09116143886034037 error None devtrain: score 0.02524214008242845 error None
15 epochs stored so far and keeping all.
start epoch 435 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4172, avg_data_len=1)
train epoch 435, finished after 4172 steps, 0:19:35 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.435
[2023-10-12 12:36:50,307] INFO: Run time: 14:57:55 CPU: 0.80% RSS: 9.22GB VMS: 34.81GB
epoch 435 score: 0.08109052435187306 error: None elapsed: 0:19:35
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 435 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.4% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 435 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.6% computing time)
dev: score 0.09162276287399478 error None devtrain: score 0.025338723087977005 error None
We have stored models for epochs [359, 364, 366, ..., 433, 434, 435] and keep epochs [359, 364, 366, 378, 406, 425, 426, 428, 429, 430, 431, 432, 433, 434, 435].
We will delete the models of epochs [415].
Deleted 293.7MB.
start epoch 436 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4206, avg_data_len=1)
train epoch 436, finished after 4206 steps, 0:19:47 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.436
epoch 436 score: 0.08151726521685654 error: None elapsed: 0:19:47
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 436 'dev' eval, finished after 31 steps, 0:00:04 elapsed (57.0% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 436 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (57.4% computing time)
dev: score 0.09123822666065062 error None devtrain: score 0.024714037350225735 error None
We have stored models for epochs [359, 364, 366, ..., 434, 435, 436] and keep epochs [359, 364, 366, 378, 406, 425, 426, 428, 430, 431, 432, 433, 434, 435, 436].
We will delete the models of epochs [429].
Deleted 293.7MB.
start epoch 437 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4164, avg_data_len=1)
train epoch 437, finished after 4164 steps, 0:19:33 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.437
epoch 437 score: 0.08122536741517272 error: None elapsed: 0:19:33
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 437 'dev' eval, finished after 31 steps, 0:00:04 elapsed (57.9% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 437 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (56.3% computing time)
dev: score 0.09141868853581352 error None devtrain: score 0.024465649144805472 error None
We have stored models for epochs [359, 364, 366, ..., 435, 436, 437] and keep epochs [359, 364, 366, 378, 406, 425, 426, 428, 430, 431, 433, 434, 435, 436, 437].
We will delete the models of epochs [432].
Deleted 293.7MB.
start epoch 438 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4201, avg_data_len=1)
train epoch 438, finished after 4201 steps, 0:19:43 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.438
epoch 438 score: 0.08049115664195709 error: None elapsed: 0:19:43
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 438 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.7% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 438 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.3% computing time)
dev: score 0.09252951738679259 error None devtrain: score 0.024193224574030614 error None
We have stored models for epochs [359, 364, 366, ..., 436, 437, 438] and keep epochs [359, 364, 366, 378, 406, 425, 426, 431, 433, 434, 435, 436, 437, 438].
We will delete the models of epochs [428, 430].
Deleted 587.4MB.
start epoch 439 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4181, avg_data_len=1)
train epoch 439, finished after 4181 steps, 0:19:39 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.439
epoch 439 score: 0.08105641147729524 error: None elapsed: 0:19:39
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 439 'dev' eval, finished after 31 steps, 0:00:04 elapsed (59.0% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 439 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.9% computing time)
dev: score 0.09247350170353517 error None devtrain: score 0.024670380220718414 error None
We have stored models for epochs [359, 364, 366, ..., 437, 438, 439] and keep epochs [359, 364, 366, 378, 406, 425, 426, 431, 433, 435, 436, 437, 438, 439].
We will delete the models of epochs [434].
Deleted 293.7MB.
start epoch 440 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4163, avg_data_len=1)
train epoch 440, finished after 4163 steps, 0:19:31 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.440
epoch 440 score: 0.08158685601826442 error: None elapsed: 0:19:31
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 440 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.3% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 440 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.2% computing time)
dev: score 0.09227849658022595 error None devtrain: score 0.024860019932519344 error None
15 epochs stored so far and keeping all.
start epoch 441 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4186, avg_data_len=1)
train epoch 441, finished after 4186 steps, 0:19:38 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.441
epoch 441 score: 0.08145152559881456 error: None elapsed: 0:19:38
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 441 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.3% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 441 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.3% computing time)
dev: score 0.09168181760276106 error None devtrain: score 0.024747965844799987 error None
We have stored models for epochs [359, 364, 366, ..., 439, 440, 441] and keep epochs [359, 364, 366, 378, 406, 425, 426, 431, 433, 435, 437, 438, 439, 440, 441].
We will delete the models of epochs [436].
Deleted 293.7MB.
start epoch 442 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4181, avg_data_len=1)
train epoch 442, finished after 4181 steps, 0:19:38 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.442
epoch 442 score: 0.08088496131179494 error: None elapsed: 0:19:38
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 442 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.8% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 442 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.6% computing time)
dev: score 0.09188674846398792 error None devtrain: score 0.024465029976891536 error None
We have stored models for epochs [359, 364, 366, ..., 440, 441, 442] and keep epochs [359, 364, 366, 378, 406, 425, 426, 435, 437, 438, 439, 440, 441, 442].
We will delete the models of epochs [431, 433].
Deleted 587.4MB.
start epoch 443 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4210, avg_data_len=1)
train epoch 443, finished after 4210 steps, 0:19:49 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.443
epoch 443 score: 0.08175147431729346 error: None elapsed: 0:19:49
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 443 'dev' eval, finished after 31 steps, 0:00:04 elapsed (59.0% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 443 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.3% computing time)
dev: score 0.0918915893859721 error None devtrain: score 0.023667121564226266 error None
15 epochs stored so far and keeping all.
start epoch 444 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4199, avg_data_len=1)
train epoch 444, finished after 4199 steps, 0:19:45 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.444
epoch 444 score: 0.08247715732985156 error: None elapsed: 0:19:45
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 444 'dev' eval, finished after 31 steps, 0:00:04 elapsed (59.0% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 444 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.5% computing time)
dev: score 0.09163948981342522 error None devtrain: score 0.023713764680208253 error None
16 epochs stored so far and keeping all.
start epoch 445 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4184, avg_data_len=1)
train epoch 445, finished after 4184 steps, 0:19:40 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.445
epoch 445 score: 0.08020359040302102 error: None elapsed: 0:19:40
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 445 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.3% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 445 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.3% computing time)
dev: score 0.09256254497999448 error None devtrain: score 0.02486299441057453 error None
We have stored models for epochs [359, 364, 366, ..., 443, 444, 445] and keep epochs [359, 364, 366, 378, 406, 425, 426, 435, 438, 439, 441, 442, 443, 444, 445].
We will delete the models of epochs [437, 440].
Deleted 587.4MB.
start epoch 446 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4204, avg_data_len=1)
train epoch 446, finished after 4204 steps, 0:19:46 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.446
epoch 446 score: 0.07994435010516164 error: None elapsed: 0:19:46
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 446 'dev' eval, finished after 31 steps, 0:00:04 elapsed (59.2% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 446 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.9% computing time)
dev: score 0.09147745484412217 error None devtrain: score 0.024438725137651756 error None
We have stored models for epochs [359, 364, 366, ..., 444, 445, 446] and keep epochs [359, 364, 366, 378, 406, 425, 426, 438, 439, 442, 443, 444, 445, 446].
We will delete the models of epochs [435, 441].
Deleted 587.4MB.
start epoch 447 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4169, avg_data_len=1)
train epoch 447, finished after 4169 steps, 0:19:35 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.447
epoch 447 score: 0.08131296808776979 error: None elapsed: 0:19:35
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 447 'dev' eval, finished after 31 steps, 0:00:04 elapsed (57.4% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 447 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.4% computing time)
dev: score 0.09146317625040964 error None devtrain: score 0.02413896989402761 error None
We have stored models for epochs [359, 364, 366, ..., 445, 446, 447] and keep epochs [359, 364, 366, 378, 406, 425, 438, 439, 442, 443, 444, 445, 446, 447].
We will delete the models of epochs [426].
Deleted 293.7MB.
start epoch 448 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4188, avg_data_len=1)
train epoch 448, finished after 4188 steps, 0:19:39 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.448
epoch 448 score: 0.08064634796145521 error: None elapsed: 0:19:39
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 448 'dev' eval, finished after 31 steps, 0:00:04 elapsed (59.0% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 448 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.5% computing time)
dev: score 0.09170052168613516 error None devtrain: score 0.023905761854374333 error None
We have stored models for epochs [359, 364, 366, ..., 446, 447, 448] and keep epochs [359, 364, 366, 378, 406, 438, 442, 443, 444, 445, 446, 447, 448].
We will delete the models of epochs [425, 439].
Deleted 587.4MB.
start epoch 449 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4188, avg_data_len=1)
train epoch 449, finished after 4188 steps, 0:19:41 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.449
epoch 449 score: 0.08224647954197745 error: None elapsed: 0:19:41
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 449 'dev' eval, finished after 31 steps, 0:00:04 elapsed (58.2% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 449 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (56.0% computing time)
dev: score 0.09155218936015956 error None devtrain: score 0.02332821860086142 error None
14 epochs stored so far and keeping all.
start epoch 450 with learning rate 1e-05 ...
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=4182, avg_data_len=1)
train epoch 450, finished after 4182 steps, 0:19:38 elapsed (99.3% computing time)
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.450
epoch 450 score: 0.08191543829412272 error: None elapsed: 0:19:38
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=31, avg_data_len=1)
epoch 450 'dev' eval, finished after 31 steps, 0:00:04 elapsed (59.1% computing time)
Stats:
mem_usage:GPU:0: Stats(mean=5.5GB, std_dev=0.0B, min=5.5GB, max=5.5GB, num_seqs=29, avg_data_len=1)
epoch 450 'devtrain' eval, finished after 29 steps, 0:00:04 elapsed (55.9% computing time)
dev: score 0.09324229272269091 error None devtrain: score 0.02383414714207614 error None
15 epochs stored so far and keeping all.
Save model under /u/maximilian.kannen/setups/20230406_feat/work/i6_core/returnn/training/ReturnnTrainingJob.UP1hAn0FnLY7/output/models/epoch.450
Finished training in epoch 450.
elapsed: 19:57:14.0412
Quitting
[2023-10-12 17:36:19,697] INFO: Run time: 19:57:24 CPU: 0.40% RSS: 7.32GB VMS: 30.96GB
[2023-10-12 17:36:29,741] INFO: Run time: 19:57:34 CPU: 0.20% RSS: 77MB VMS: 294MB
[2023-10-12 17:36:32,219] INFO: Max resources: Run time: 19:57:37 CPU: 158.8% RSS: 9.86GB VMS: 36.00GB
[2023-10-12 17:36:32,221] INFO: Job finished successfully
--------------------- Slurm Task Epilog ------------------------
Job ID: 2550071
Time: Do 12. Okt 17:36:32 CEST 2023
Elapsed Time: 19:57:39
Billing per second for TRES: billing=88,cpu=3,gres/gpu=1,mem=16G,node=1
Show resource usage with e.g.:
sacct -j 2550071 -o Elapsed,TotalCPU,UserCPU,SystemCPU,MaxRSS,ReqTRES%60,MaxDiskRead,MaxDiskWrite
--------------------- Slurm Task Epilog ------------------------
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment