Robin Schmitt robin-p-schmitt

## learning_rates
{
1: EpochData(learningRate=0.0001, error={
'dev_error_ctc': 0.9583588315669095,
'dev_error_label_model/label_prob': 0.6887621695767638,
'dev_error_label_model/length_model': 0.9999999998499519,
'dev_score_ctc': 0.0,
'dev_score_label_model/label_prob': 46.05170047498763,
'dev_score_label_model/length_model': float('nan'),
'devtrain_error_ctc': 0.9618647864437795,
'devtrain_error_label_model/label_prob': 0.6914862408234509,

## error.log
RETURNN starting up, version 1.20220502.144101+git.2273d36, date/time 2022-05-16-11-34-14 (UTC+0200), pid 16843, cwd /work/asr3/zeyer/schmitt/sisyphus_work_dirs/transducer/i6_core/returnn/training/ReturnnTrainingJob.CdLKzpChhtbs/work, Python /work/tools/asr/python/3.8.0_tf_2.3-v1-generic+cuda10.1/bin/python
RETURNN command line options: ['/u/schmitt/experiments/transducer/work/i6_core/returnn/training/ReturnnTrainingJob.CdLKzpChhtbs/output/returnn.config']
Hostname: cluster-cn-211
TensorFlow: 2.3.0 (v2.3.0-2-gee598066c4) (<site-package> in /work/tools/asr/python/3.8.0_tf_2.3-v1-generic+cuda10.1/lib/python3.8/site-packages/tensorflow)
Use num_threads=4 (but min 2) via OMP_NUM_THREADS.
Setup TF inter and intra global thread pools, num_threads 4, session opts {'log_device_placement': False, 'device_count': {'GPU': 0}, 'intra_op_parallelism_threads': 4, 'inter_op_parallelism_threads': 4}.
CUDA_VISIBLE_DEVICES is set to '0'.
Collecting TensorFlow device list...
Local devices available to TensorFlow:
  1/4: name: "

## error log
Train data:
  input: 0 x 1
  output: {'alignment': [1031, 1], 'data': (40, 2)}
  MetaDataset, sequences: 37841, frames: unknown
Dev data:
  MetaDataset, sequences: 3000, frames: unknown
Device not set explicitly, and we found a GPU, which we will use.
Reading sequence list for MetaDataset 'devtrain' from sub-dataset 'devtrain_align'
Setup TF session with options {'log_device_placement': False, 'device_count': {'GPU': 1}} ...
layer /'data:alignment': [B,T|'output-len'[B]] int32 sparse_dim=Dim{F'alignment:sparse-dim'(1031)}

## error_log
Returnn compile-tf-graph starting up.
RETURNN starting up, version 1.20220407.140523+git.a3fe10c, date/time 2022-04-08-14-35-20 (UTC+0200), pid 28980, cwd /work/asr3/zeyer/schmitt/sisyphus_work_dirs/transducer/i6_private/users/schmitt/returnn/tools/CompileTFGraphJob.n6PriwSUjQ1a/work, Python /work/tools/asr/python/3.8.0_tf_2.3-v1-generic+cuda10.1/bin/python
Hostname: cluster-cn-214
TensorFlow: 2.3.0 (v2.3.0-2-gee598066c4) (<site-package> in /work/tools/asr/python/3.8.0_tf_2.3-v1-generic+cuda10.1/lib/python3.8/site-packages/tensorflow)
Use num_threads=4 (but min 2) via OMP_NUM_THREADS.
Setup TF inter and intra global thread pools, num_threads 4, session opts {'log_device_placement': False, 'device_count': {'GPU': 0}, 'intra_op_parallelism_threads': 4, 'inter_op_parallelism_threads': 4}.
2022-04-08 14:35:20.727432: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical

## error_log
Create graph...
Loading network, train flag False, eval flag False, search flag False
layer /'data:bpe': [B,T|'time:var:extern_data:bpe'[B]] int32 sparse_dim=Dim{F'bpe:sparse-dim'(1030)}
[2022-04-01 11:50:46,204] INFO: Run time: 0:00:15 CPU: 0.40% RSS: 922MB VMS: 12.95GB
layer /'data': [B,T|'time'[B],F|F'feature:data'(40)] float32
layer /'source_stddev': [B,T|'time'[B],F|F'feature:data'(40)] float32
layer /'source': [B,T|'time'[B],F|F'feature:data'(40)] float32
layer /'source0': [B,T|'time'[B],F'feature:data'(40),F|F'source0_split_dims1'(1)] float32
DEPRECATION WARNING: Explicitly specify in_spatial_dims when there is more than one spatial dim in the input.
This will be disallowed with behavior_version 8.

## gist:981bd04d967f212aee8c7444aab373d3
layer <network via test_MaskedComputationLayer_dyn_size_none>/'data': [B,T|'time:var:extern_data:data'[B],F|F'feature:data'(20)] float32
layer <network via test_MaskedComputationLayer_dyn_size_none>/'rec_loop': [T|'time:var:extern_data:data'[B&Beam{'rec_loop/output'}(4)],B&Beam{'rec_loop/output'}(4)] int32 sparse_dim=Dim{F'classes:sparse-dim'(20)}
Rec layer 'rec_loop' (search True, train False) sub net:
  Input layers moved out of loop: (#: 0)
    None
  Output layers moved out of loop: (#: 0)
    None
  Layers in loop: (#: 2)
    output
    lin

## graph.metatxt
meta_info_def {
  stripped_op_list {
    op {
      name: "Add"
      input_arg {
        name: "x"
        type_attr: "T"
      }
      input_arg {
        name: "y"

## returnn.config
#!rnn.py


from returnn.tf.util.data import Dim
import os
import numpy as np
from subprocess import check_output, CalledProcessError


def _mask(x, batch_axis, axis, pos, max_amount, mask_value=0.0):

## error_log
TypeError creating layer /'lstm0_pool' of class PoolLayer with opts:
{'_name': 'lstm0_pool',
 '_network': <TFNetwork '' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>,
 'mode': 'max',
 'name': 'lstm0_pool',
 'network': <TFNetwork '' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>,
 'padding': 'same',
 'pool_size': (6,),
 'sources': [<RecLayer 'lstm0_fw' out_type=Data{[T|'time'[B],B,F|F'lstm0_fw:feature'(512)]}>,
             <RecLayer 'lstm0_bw' out_type=Data{[T|'time'[B],B,F|F'lstm0_bw:feature'(512)]}>]}

## error_log
Create graph...
Loading network, train flag False, eval flag False, search flag False
DEPRECATION WARNING: Missing "from" in layer definition: root/source
This will be disallowed with behavior_version 1.
layer root/'data' output: Data{'data', [B,T|'time'[B],F|F'feature:data'(40)]}
layer root/'source' output: Data{'data', [B,T|'time'[B],F|F'feature:data'(40)]}
layer root/'source0' output: Data{'source0_output', [B,T|'time'[B],F'feature:data'(40),F|F'source0_split_dims1'(1)]}
DEPRECATION WARNING: Explicitly specify in_spatial_dims when there is more than one spatial dim in the input.
This will be disallowed with behavior_version 8.
layer root/'conv0' output: Data{'conv0_output', [B,T|'time'[B],F'feature:data'(40),F|F'conv0:channel'(32)]}
	{
	1: EpochData(learningRate=0.0001, error={
	'dev_error_ctc': 0.9583588315669095,
	'dev_error_label_model/label_prob': 0.6887621695767638,
	'dev_error_label_model/length_model': 0.9999999998499519,
	'dev_score_ctc': 0.0,
	'dev_score_label_model/label_prob': 46.05170047498763,
	'dev_score_label_model/length_model': float('nan'),
	'devtrain_error_ctc': 0.9618647864437795,
	'devtrain_error_label_model/label_prob': 0.6914862408234509,
	RETURNN starting up, version 1.20220502.144101+git.2273d36, date/time 2022-05-16-11-34-14 (UTC+0200), pid 16843, cwd /work/asr3/zeyer/schmitt/sisyphus_work_dirs/transducer/i6_core/returnn/training/ReturnnTrainingJob.CdLKzpChhtbs/work, Python /work/tools/asr/python/3.8.0_tf_2.3-v1-generic+cuda10.1/bin/python
	RETURNN command line options: ['/u/schmitt/experiments/transducer/work/i6_core/returnn/training/ReturnnTrainingJob.CdLKzpChhtbs/output/returnn.config']
	Hostname: cluster-cn-211
	TensorFlow: 2.3.0 (v2.3.0-2-gee598066c4) (<site-package> in /work/tools/asr/python/3.8.0_tf_2.3-v1-generic+cuda10.1/lib/python3.8/site-packages/tensorflow)
	Use num_threads=4 (but min 2) via OMP_NUM_THREADS.
	Setup TF inter and intra global thread pools, num_threads 4, session opts {'log_device_placement': False, 'device_count': {'GPU': 0}, 'intra_op_parallelism_threads': 4, 'inter_op_parallelism_threads': 4}.
	CUDA_VISIBLE_DEVICES is set to '0'.
	Collecting TensorFlow device list...
	Local devices available to TensorFlow:
	1/4: name: "
	Train data:
	input: 0 x 1
	output: {'alignment': [1031, 1], 'data': (40, 2)}
	MetaDataset, sequences: 37841, frames: unknown
	Dev data:
	MetaDataset, sequences: 3000, frames: unknown
	Device not set explicitly, and we found a GPU, which we will use.
	Reading sequence list for MetaDataset 'devtrain' from sub-dataset 'devtrain_align'
	Setup TF session with options {'log_device_placement': False, 'device_count': {'GPU': 1}} ...
	layer /'data:alignment': [B,T\|'output-len'[B]] int32 sparse_dim=Dim{F'alignment:sparse-dim'(1031)}
	Returnn compile-tf-graph starting up.
	RETURNN starting up, version 1.20220407.140523+git.a3fe10c, date/time 2022-04-08-14-35-20 (UTC+0200), pid 28980, cwd /work/asr3/zeyer/schmitt/sisyphus_work_dirs/transducer/i6_private/users/schmitt/returnn/tools/CompileTFGraphJob.n6PriwSUjQ1a/work, Python /work/tools/asr/python/3.8.0_tf_2.3-v1-generic+cuda10.1/bin/python
	Hostname: cluster-cn-214
	TensorFlow: 2.3.0 (v2.3.0-2-gee598066c4) (<site-package> in /work/tools/asr/python/3.8.0_tf_2.3-v1-generic+cuda10.1/lib/python3.8/site-packages/tensorflow)
	Use num_threads=4 (but min 2) via OMP_NUM_THREADS.
	Setup TF inter and intra global thread pools, num_threads 4, session opts {'log_device_placement': False, 'device_count': {'GPU': 0}, 'intra_op_parallelism_threads': 4, 'inter_op_parallelism_threads': 4}.
	2022-04-08 14:35:20.727432: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical
	Create graph...
	Loading network, train flag False, eval flag False, search flag False
	layer /'data:bpe': [B,T\|'time:var:extern_data:bpe'[B]] int32 sparse_dim=Dim{F'bpe:sparse-dim'(1030)}
	[2022-04-01 11:50:46,204] INFO: Run time: 0:00:15 CPU: 0.40% RSS: 922MB VMS: 12.95GB
	layer /'data': [B,T\|'time'[B],F\|F'feature:data'(40)] float32
	layer /'source_stddev': [B,T\|'time'[B],F\|F'feature:data'(40)] float32
	layer /'source': [B,T\|'time'[B],F\|F'feature:data'(40)] float32
	layer /'source0': [B,T\|'time'[B],F'feature:data'(40),F\|F'source0_split_dims1'(1)] float32
	DEPRECATION WARNING: Explicitly specify in_spatial_dims when there is more than one spatial dim in the input.
	This will be disallowed with behavior_version 8.
	layer <network via test_MaskedComputationLayer_dyn_size_none>/'data': [B,T\|'time:var:extern_data:data'[B],F\|F'feature:data'(20)] float32
	layer <network via test_MaskedComputationLayer_dyn_size_none>/'rec_loop': [T\|'time:var:extern_data:data'[B&Beam{'rec_loop/output'}(4)],B&Beam{'rec_loop/output'}(4)] int32 sparse_dim=Dim{F'classes:sparse-dim'(20)}
	Rec layer 'rec_loop' (search True, train False) sub net:
	Input layers moved out of loop: (#: 0)
	None
	Output layers moved out of loop: (#: 0)
	None
	Layers in loop: (#: 2)
	output
	lin
	meta_info_def {
	stripped_op_list {
	op {
	name: "Add"
	input_arg {
	name: "x"
	type_attr: "T"
	}
	input_arg {
	name: "y"
	#!rnn.py


	from returnn.tf.util.data import Dim
	import os
	import numpy as np
	from subprocess import check_output, CalledProcessError


	def _mask(x, batch_axis, axis, pos, max_amount, mask_value=0.0):
	TypeError creating layer /'lstm0_pool' of class PoolLayer with opts:
	{'_name': 'lstm0_pool',
	'_network': <TFNetwork '' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>,
	'mode': 'max',
	'name': 'lstm0_pool',
	'network': <TFNetwork '' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>,
	'padding': 'same',
	'pool_size': (6,),
	'sources': [<RecLayer 'lstm0_fw' out_type=Data{[T\|'time'[B],B,F\|F'lstm0_fw:feature'(512)]}>,
	<RecLayer 'lstm0_bw' out_type=Data{[T\|'time'[B],B,F\|F'lstm0_bw:feature'(512)]}>]}
	Create graph...
	Loading network, train flag False, eval flag False, search flag False
	DEPRECATION WARNING: Missing "from" in layer definition: root/source
	This will be disallowed with behavior_version 1.
	layer root/'data' output: Data{'data', [B,T\|'time'[B],F\|F'feature:data'(40)]}
	layer root/'source' output: Data{'data', [B,T\|'time'[B],F\|F'feature:data'(40)]}
	layer root/'source0' output: Data{'source0_output', [B,T\|'time'[B],F'feature:data'(40),F\|F'source0_split_dims1'(1)]}
	DEPRECATION WARNING: Explicitly specify in_spatial_dims when there is more than one spatial dim in the input.
	This will be disallowed with behavior_version 8.
	layer root/'conv0' output: Data{'conv0_output', [B,T\|'time'[B],F'feature:data'(40),F\|F'conv0:channel'(32)]}