manish-kumar-garg/returnn_distributed_error.txt

## returnn_distributed_error.txt
[ip-10-1-21-241:76137] Warning: could not find environment variable "HOROVOD_TIMELINE"
[ip-10-1-21-241:76137] Warning: could not find environment variable "DEBUG"
Horovod initialized. Hostname ip-10-1-21-241, pid 76142, rank 0 / size 2, local rank 0 / local size 2.
Horovod initialized. Hostname ip-10-1-21-241, pid 76143, rank 1 / size 2, local rank 1 / local size 2.
RETURNN starting up, version 20191217.234858--git-09b41c6f-dirty, date/time 2019-12-21-20-15-27 (UTC+0000), pid 76142, cwd /home/ubuntu/rwth-i6/returnn-experiments/2018-asr-attention/librispeech/full-setup-attention, Python /home/ubuntu/anaconda3/envs/tensorflow_p36/bin/python3
RETURNN starting up, version 20191217.234858--git-09b41c6f-dirty, date/time 2019-12-21-20-15-27 (UTC+0000), pid 76143, cwd /home/ubuntu/rwth-i6/returnn-experiments/2018-asr-attention/librispeech/full-setup-attention, Python /home/ubuntu/anaconda3/envs/tensorflow_p36/bin/python3
RETURNN command line options: ['returnn-distributed.config']
Hostname: ip-10-1-21-241
RETURNN command line options: ['returnn-distributed.config']
Hostname: ip-10-1-21-241
TensorFlow: 1.15.0 (v1.15.0-rc3-22-g590d6ee) (<site-package> in /home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow)
TensorFlow: 1.15.0 (v1.15.0-rc3-22-g590d6ee) (<site-package> in /home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow)
Horovod: Hostname ip-10-1-21-241, pid 76142, using GPU 0.
Horovod: Reduce type: grad
Horovod: Hostname ip-10-1-21-241, pid 76143, using GPU 1.
Setup TF inter and intra global thread pools, num_threads None, session opts {'gpu_options': {'visible_device_list': '0'}, 'log_device_placement': False, 'device_count': {'GPU': 0}}.
Setup TF inter and intra global thread pools, num_threads None, session opts {'gpu_options': {'visible_device_list': '1'}, 'log_device_placement': False, 'device_count': {'GPU': 0}}.
CUDA_VISIBLE_DEVICES is not set.
CUDA_VISIBLE_DEVICES is not set.
TF session gpu_options.visible_device_list is set to '1'.
Collecting TensorFlow device list...
TF session gpu_options.visible_device_list is set to '0'.
Collecting TensorFlow device list...
Local devices available to TensorFlow:
  1/4: name: '/job:localhost/replica:0/task:0/device:CPU:0'
       device_type: 'CPU'
       memory_limit_bytes: 268435456
       physical_device_desc: ''
  2/4: name: '/job:localhost/replica:0/task:0/device:XLA_CPU:0'
       device_type: 'XLA_CPU'
       memory_limit_bytes: 17179869184
       physical_device_desc: None
  3/4: name: '/job:localhost/replica:0/task:0/device:XLA_GPU:0'
       device_type: 'XLA_GPU'
       memory_limit_bytes: 17179869184
       physical_device_desc: None
  4/4: name: '/job:localhost/replica:0/task:0/device:GPU:0'
       device_type: 'GPU'
       memory_limit_bytes: 11321150669
       physical_device_desc: 'device: 0, name: Tesla K80, pci bus id: 0000:00:17.0, compute capability: 3.7'
Using gpu device 0: Tesla K80
Local devices available to TensorFlow:
  1/4: name: '/job:localhost/replica:0/task:0/device:CPU:0'
       device_type: 'CPU'
       memory_limit_bytes: 268435456
       physical_device_desc: ''
  2/4: name: '/job:localhost/replica:0/task:0/device:XLA_CPU:0'
       device_type: 'XLA_CPU'
       memory_limit_bytes: 17179869184
       physical_device_desc: None
  3/4: name: '/job:localhost/replica:0/task:0/device:XLA_GPU:1'
       device_type: 'XLA_GPU'
       memory_limit_bytes: 17179869184
       physical_device_desc: None
  4/4: name: '/job:localhost/replica:0/task:0/device:GPU:0'
       device_type: 'GPU'
       memory_limit_bytes: 11321150669
       physical_device_desc: 'device: 1, name: Tesla K80, pci bus id: 0000:00:18.0, compute capability: 3.7'
Using gpu device 1: Tesla K80
<LibriSpeechCorpus 'train' epoch=1>, epoch 1. Old mean seq len (transcription) is 183.267376, new is 63.708029, requested max is 75.000000. Old num seqs is 6575, new num seqs is 822.
<LibriSpeechCorpus 'train' epoch=1>, epoch 1. Old num seqs 14063, new num seqs 822.
<LibriSpeechCorpus 'train' epoch=1>, epoch 1. Old mean seq len (transcription) is 183.267376, new is 63.708029, requested max is 75.000000. Old num seqs is 6575, new num seqs is 822.
<LibriSpeechCorpus 'train' epoch=1>, epoch 1. Old num seqs 14063, new num seqs 822.
<LibriSpeechCorpus 'train' epoch=1>, epoch 1. Old mean seq len (transcription) is 183.267376, new is 63.708029, requested max is 75.000000. Old num seqs is 6575, new num seqs is 822.
<LibriSpeechCorpus 'train' epoch=1>, epoch 1. Old num seqs 14063, new num seqs 822.
Train data:
  input: 40 x 1
  output: {'classes': [10025, 1], 'raw': {'dtype': 'string', 'shape': ()}, 'data': [40, 2]}
  LibriSpeechCorpus, sequences: 822, frames: unknown
Dev data:
  LibriSpeechCorpus, sequences: 3000, frames: unknown
<LibriSpeechCorpus 'train' epoch=1>, epoch 1. Old mean seq len (transcription) is 183.267376, new is 63.708029, requested max is 75.000000. Old num seqs is 6575, new num seqs is 822.
<LibriSpeechCorpus 'train' epoch=1>, epoch 1. Old num seqs 14063, new num seqs 822.
Train data:
  input: 40 x 1
  output: {'classes': [10025, 1], 'raw': {'dtype': 'string', 'shape': ()}, 'data': [40, 2]}
  LibriSpeechCorpus, sequences: 822, frames: unknown
Dev data:
  LibriSpeechCorpus, sequences: 3000, frames: unknown
Learning-rate-control: file data/exp-returnn-distributed/train-scores.data does not exist yet
Learning-rate-control: file data/exp-returnn-distributed/train-scores.data does not exist yet
Update config key 'max_seq_length' for epoch 1: {'classes': 75} -> {'classes': 60}
Setup tf.Session with options {'gpu_options': {'visible_device_list': '0'}, 'log_device_placement': False, 'device_count': {'GPU': 1}} ...
Update config key 'max_seq_length' for epoch 1: {'classes': 75} -> {'classes': 60}
Setup tf.Session with options {'gpu_options': {'visible_device_list': '1'}, 'log_device_placement': False, 'device_count': {'GPU': 1}} ...
layer root/'data' output: Data(name='data', shape=(None, 40), batch_shape_meta=[B,T|'time:var:extern_data:data',F|40])
layer root/'source' output: Data(name='source_output', shape=(None, 40), batch_shape_meta=[B,T|'time:var:extern_data:data',F|40])
layer root/'lstm0_fw' output: Data(name='lstm0_fw_output', shape=(None, 1024), batch_dim_axis=1, batch_shape_meta=[T|'time:var:extern_data:data',B,F|1024])
layer root/'data' output: Data(name='data', shape=(None, 40), batch_shape_meta=[B,T|'time:var:extern_data:data',F|40])
layer root/'source' output: Data(name='source_output', shape=(None, 40), batch_shape_meta=[B,T|'time:var:extern_data:data',F|40])
layer root/'lstm0_fw' output: Data(name='lstm0_fw_output', shape=(None, 1024), batch_dim_axis=1, batch_shape_meta=[T|'time:var:extern_data:data',B,F|1024])
layer root/'lstm0_bw' output: Data(name='lstm0_bw_output', shape=(None, 1024), batch_dim_axis=1, batch_shape_meta=[T|'time:var:extern_data:data',B,F|1024])
layer root/'lstm0_bw' output: Data(name='lstm0_bw_output', shape=(None, 1024), batch_dim_axis=1, batch_shape_meta=[T|'time:var:extern_data:data',B,F|1024])
layer root/'lstm0_pool' output: Data(name='lstm0_pool_output', shape=(None, 2048), batch_shape_meta=[B,T|?,F|2048])
layer root/'lstm5_fw' output: Data(name='lstm5_fw_output', shape=(None, 1024), batch_dim_axis=1, batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|1024])
layer root/'lstm0_pool' output: Data(name='lstm0_pool_output', shape=(None, 2048), batch_shape_meta=[B,T|?,F|2048])
layer root/'lstm5_fw' output: Data(name='lstm5_fw_output', shape=(None, 1024), batch_dim_axis=1, batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|1024])
layer root/'lstm5_bw' output: Data(name='lstm5_bw_output', shape=(None, 1024), batch_dim_axis=1, batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|1024])
layer root/'lstm5_bw' output: Data(name='lstm5_bw_output', shape=(None, 1024), batch_dim_axis=1, batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|1024])
layer root/'encoder' output: Data(name='encoder_output', shape=(None, 2048), batch_dim_axis=1, batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|2048])
layer root/'ctc' output: Data(name='ctc_output', shape=(None, 10026), batch_dim_axis=1, batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|10026])
layer root/'enc_ctx' output: Data(name='enc_ctx_output', shape=(None, 1024), batch_dim_axis=1, batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|1024])
layer root/'encoder' output: Data(name='encoder_output', shape=(None, 2048), batch_dim_axis=1, batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|2048])
layer root/'ctc' output: Data(name='ctc_output', shape=(None, 10026), batch_dim_axis=1, batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|10026])
layer root/'inv_fertility' output: Data(name='inv_fertility_output', shape=(None, 1), batch_dim_axis=1, batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|1])
layer root/'enc_ctx' output: Data(name='enc_ctx_output', shape=(None, 1024), batch_dim_axis=1, batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|1024])
layer root/'enc_value' output: Data(name='enc_value_output', shape=(None, 1, 2048), batch_dim_axis=1, batch_shape_meta=[T|'spatial:0:lstm0_pool',B,1,F|2048])
layer root/'inv_fertility' output: Data(name='inv_fertility_output', shape=(None, 1), batch_dim_axis=1, batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|1])
layer root/'output' output: Data(name='output_output', shape=(None,), dtype='int32', sparse=True, dim=10025, batch_dim_axis=1, batch_shape_meta=[T|'time:var:extern_data:classes',B])
layer root/'enc_value' output: Data(name='enc_value_output', shape=(None, 1, 2048), batch_dim_axis=1, batch_shape_meta=[T|'spatial:0:lstm0_pool',B,1,F|2048])
layer root/'output' output: Data(name='output_output', shape=(None,), dtype='int32', sparse=True, dim=10025, batch_dim_axis=1, batch_shape_meta=[T|'time:var:extern_data:classes',B])
Rec layer 'output' (search False, train 'globals/train_flag:0') sub net:
  Input layers moved out of loop: (#: 2)
    output
    target_embed
  Output layers moved out of loop: (#: 3)
    output_prob
    readout
    readout_in
  Layers in loop: (#: 10)
    s
    att
    att0
    att_weights
    energy
    energy_tanh
    energy_in
    weight_feedback
    accum_att_weights
    s_transformed
  Unused layers: (#: 1)
    end
layer root/output:rec-subnet-input/'output' output: Data(name='output_output', shape=(None,), dtype='int32', sparse=True, dim=10025, batch_shape_meta=[B,T|'time:var:extern_data:classes'])
layer root/output:rec-subnet-input/'target_embed' output: Data(name='target_embed_output', shape=(None, 621), batch_shape_meta=[B,T|'time:var:extern_data:classes',F|621])
Rec layer 'output' (search False, train 'globals/train_flag:0') sub net:
  Input layers moved out of loop: (#: 2)
    output
    target_embed
  Output layers moved out of loop: (#: 3)
    output_prob
    readout
    readout_in
  Layers in loop: (#: 10)
    s
    att
    att0
    att_weights
    energy
    energy_tanh
    energy_in
    weight_feedback
    accum_att_weights
    s_transformed
  Unused layers: (#: 1)
    end
layer root/output:rec-subnet-input/'output' output: Data(name='output_output', shape=(None,), dtype='int32', sparse=True, dim=10025, batch_shape_meta=[B,T|'time:var:extern_data:classes'])
layer root/output:rec-subnet-input/'target_embed' output: Data(name='target_embed_output', shape=(None, 621), batch_shape_meta=[B,T|'time:var:extern_data:classes',F|621])
layer root/output:rec-subnet/'weight_feedback' output: Data(name='weight_feedback_output', shape=(None, 1024), batch_dim_axis=1, batch_shape_meta=[T|?,B,F|1024])
layer root/output:rec-subnet/'prev:target_embed' output: Data(name='target_embed_output', shape=(621,), time_dim_axis=None, batch_shape_meta=[B,F|621])
layer root/output:rec-subnet/'s' output: Data(name='s_output', shape=(1000,), time_dim_axis=None, batch_shape_meta=[B,F|1000])
layer root/output:rec-subnet/'weight_feedback' output: Data(name='weight_feedback_output', shape=(None, 1024), batch_dim_axis=1, batch_shape_meta=[T|?,B,F|1024])
layer root/output:rec-subnet/'s_transformed' output: Data(name='s_transformed_output', shape=(1024,), time_dim_axis=None, batch_shape_meta=[B,F|1024])
layer root/output:rec-subnet/'energy_in' output: Data(name='energy_in_output', shape=(None, 1024), batch_dim_axis=1, batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|1024])
layer root/output:rec-subnet/'prev:target_embed' output: Data(name='target_embed_output', shape=(621,), time_dim_axis=None, batch_shape_meta=[B,F|621])
layer root/output:rec-subnet/'energy_tanh' output: Data(name='energy_tanh_output', shape=(None, 1024), batch_dim_axis=1, batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|1024])
layer root/output:rec-subnet/'s' output: Data(name='s_output', shape=(1000,), time_dim_axis=None, batch_shape_meta=[B,F|1000])
layer root/output:rec-subnet/'energy' output: Data(name='energy_output', shape=(None, 1), batch_dim_axis=1, batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|1])
layer root/output:rec-subnet/'att_weights' output: Data(name='att_weights_output', shape=(1, None), time_dim_axis=2, feature_dim_axis=1, batch_shape_meta=[B,F|1,T|'spatial:0:lstm0_pool'])
layer root/output:rec-subnet/'s_transformed' output: Data(name='s_transformed_output', shape=(1024,), time_dim_axis=None, batch_shape_meta=[B,F|1024])
layer root/output:rec-subnet/'energy_in' output: Data(name='energy_in_output', shape=(None, 1024), batch_dim_axis=1, batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|1024])
layer root/output:rec-subnet/'energy_tanh' output: Data(name='energy_tanh_output', shape=(None, 1024), batch_dim_axis=1, batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|1024])
layer root/output:rec-subnet/'energy' output: Data(name='energy_output', shape=(None, 1), batch_dim_axis=1, batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|1])
layer root/output:rec-subnet/'att0' output: Data(name='att0_output', shape=(1, 2048), time_dim_axis=None, batch_shape_meta=[B,1,F|2048])
layer root/output:rec-subnet/'att_weights' output: Data(name='att_weights_output', shape=(1, None), time_dim_axis=2, feature_dim_axis=1, batch_shape_meta=[B,F|1,T|'spatial:0:lstm0_pool'])
layer root/output:rec-subnet/'att' output: Data(name='att_output', shape=(2048,), time_dim_axis=None, batch_shape_meta=[B,F|2048])
layer root/output:rec-subnet/'accum_att_weights' output: Data(name='accum_att_weights_output', shape=(None, 1), batch_dim_axis=1, batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|1])
layer root/output:rec-subnet/'att0' output: Data(name='att0_output', shape=(1, 2048), time_dim_axis=None, batch_shape_meta=[B,1,F|2048])
layer root/output:rec-subnet-output/'s' output: Data(name='s_output', shape=(None, 1000), batch_dim_axis=1, batch_shape_meta=[T|'time:var:extern_data:classes',B,F|1000])
layer root/output:rec-subnet/'att' output: Data(name='att_output', shape=(2048,), time_dim_axis=None, batch_shape_meta=[B,F|2048])
layer root/output:rec-subnet-output/'prev:target_embed' output: Data(name='target_embed_output', shape=(None, 621), batch_dim_axis=1, batch_shape_meta=[T|'time:var:extern_data:classes',B,F|621])
layer root/output:rec-subnet/'accum_att_weights' output: Data(name='accum_att_weights_output', shape=(None, 1), batch_dim_axis=1, batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|1])
layer root/output:rec-subnet-output/'att' output: Data(name='att_output', shape=(None, 2048), batch_dim_axis=1, batch_shape_meta=[T|'time:var:extern_data:classes',B,F|2048])
layer root/output:rec-subnet-output/'readout_in' output: Data(name='readout_in_output', shape=(None, 1000), batch_dim_axis=1, batch_shape_meta=[T|'time:var:extern_data:classes',B,F|1000])
layer root/output:rec-subnet-output/'s' output: Data(name='s_output', shape=(None, 1000), batch_dim_axis=1, batch_shape_meta=[T|'time:var:extern_data:classes',B,F|1000])
layer root/output:rec-subnet-output/'readout' output: Data(name='readout_output', shape=(None, 500), batch_dim_axis=1, batch_shape_meta=[T|'time:var:extern_data:classes',B,F|500])
layer root/output:rec-subnet-output/'prev:target_embed' output: Data(name='target_embed_output', shape=(None, 621), batch_dim_axis=1, batch_shape_meta=[T|'time:var:extern_data:classes',B,F|621])
layer root/output:rec-subnet-output/'output_prob' output: Data(name='output_prob_output', shape=(None, 10025), batch_dim_axis=1, batch_shape_meta=[T|'time:var:extern_data:classes',B,F|10025])
layer root/output:rec-subnet-output/'att' output: Data(name='att_output', shape=(None, 2048), batch_dim_axis=1, batch_shape_meta=[T|'time:var:extern_data:classes',B,F|2048])
layer root/output:rec-subnet-output/'readout_in' output: Data(name='readout_in_output', shape=(None, 1000), batch_dim_axis=1, batch_shape_meta=[T|'time:var:extern_data:classes',B,F|1000])
layer root/output:rec-subnet-output/'readout' output: Data(name='readout_output', shape=(None, 500), batch_dim_axis=1, batch_shape_meta=[T|'time:var:extern_data:classes',B,F|500])
layer root/'decision' output: Data(name='output_output', shape=(None,), dtype='int32', sparse=True, dim=10025, batch_dim_axis=1, batch_shape_meta=[T|'time:var:extern_data:classes',B])
layer root/output:rec-subnet-output/'output_prob' output: Data(name='output_prob_output', shape=(None, 10025), batch_dim_axis=1, batch_shape_meta=[T|'time:var:extern_data:classes',B,F|10025])
layer root/'decision' output: Data(name='output_output', shape=(None,), dtype='int32', sparse=True, dim=10025, batch_dim_axis=1, batch_shape_meta=[T|'time:var:extern_data:classes',B])
Network layer topology:
  extern data: classes: Data(shape=(None,), dtype='int32', sparse=True, dim=10025, available_for_inference=False, batch_shape_meta=[B,T|'time:var:extern_data:classes']), data: Data(shape=(None, 40), batch_shape_meta=[B,T|'time:var:extern_data:data',F|40])
  used data keys: ['classes', 'data']
  layers:
    layer softmax 'ctc' #: 10026
    layer source 'data' #: 40
    layer decide 'decision' #: 10025
    layer linear 'enc_ctx' #: 1024
    layer split_dims 'enc_value' #: 2048
    layer copy 'encoder' #: 2048
    layer linear 'inv_fertility' #: 1
    layer rec 'lstm0_bw' #: 1024
    layer rec 'lstm0_fw' #: 1024
    layer pool 'lstm0_pool' #: 2048
    layer rec 'lstm5_bw' #: 1024
    layer rec 'lstm5_fw' #: 1024
    layer rec 'output' #: 10025
    layer eval 'source' #: 40
net params #: 87166092
net trainable params: [<tf.Variable 'ctc/W:0' shape=(2048, 10026) dtype=float32_ref>, <tf.Variable 'ctc/b:0' shape=(10026,) dtype=float32_ref>, <tf.Variable 'enc_ctx/W:0' shape=(2048, 1024) dtype=float32_ref>, <tf.Variable 'enc_ctx/b:0' shape=(1024,) dtype=float32_ref>, <tf.Variable 'inv_fertility/W:0' shape=(2048, 1) dtype=float32_ref>, <tf.Variable 'lstm0_bw/rec/W:0' shape=(40, 4096) dtype=float32_ref>, <tf.Variable 'lstm0_bw/rec/W_re:0' shape=(1024, 4096) dtype=float32_ref>, <tf.Variable 'lstm0_bw/rec/b:0' shape=(4096,) dtype=float32_ref>, <tf.Variable 'lstm0_fw/rec/W:0' shape=(40, 4096) dtype=float32_ref>, <tf.Variable 'lstm0_fw/rec/W_re:0' shape=(1024, 4096) dtype=float32_ref>, <tf.Variable 'lstm0_fw/rec/b:0' shape=(4096,) dtype=float32_ref>, <tf.Variable 'lstm5_bw/rec/W:0' shape=(2048, 4096) dtype=float32_ref>, <tf.Variable 'lstm5_bw/rec/W_re:0' shape=(1024, 4096) dtype=float32_ref>, <tf.Variable 'lstm5_bw/rec/b:0' shape=(4096,) dtype=float32_ref>, <tf.Variable 'lstm5_fw/rec/W:0' shape=(2048, 4096) dtype=float32_ref>, <tf.Variable 'lstm5_fw/rec/W_re:0' shape=(1024, 4096) dtype=float32_ref>, <tf.Variable 'lstm5_fw/rec/b:0' shape=(4096,) dtype=float32_ref>, <tf.Variable 'output/rec/energy/W:0' shape=(1024, 1) dtype=float32_ref>, <tf.Variable 'output/rec/output_prob/W:0' shape=(500, 10025) dtype=float32_ref>, <tf.Variable 'output/rec/output_prob/b:0' shape=(10025,) dtype=float32_ref>, <tf.Variable 'output/rec/readout_in/W:0' shape=(3669, 1000) dtype=float32_ref>, <tf.Variable 'output/rec/readout_in/b:0' shape=(1000,) dtype=float32_ref>, <tf.Variable 'output/rec/s/rec/lstm_cell/bias:0' shape=(4000,) dtype=float32_ref>, <tf.Variable 'output/rec/s/rec/lstm_cell/kernel:0' shape=(3669, 4000) dtype=float32_ref>, <tf.Variable 'output/rec/s_transformed/W:0' shape=(1000, 1024) dtype=float32_ref>, <tf.Variable 'output/rec/target_embed/W:0' shape=(10025, 621) dtype=float32_ref>, <tf.Variable 'output/rec/weight_feedback/W:0' shape=(1, 1024) dtype=float32_ref>]
Network layer topology:
  extern data: classes: Data(shape=(None,), dtype='int32', sparse=True, dim=10025, available_for_inference=False, batch_shape_meta=[B,T|'time:var:extern_data:classes']), data: Data(shape=(None, 40), batch_shape_meta=[B,T|'time:var:extern_data:data',F|40])
  used data keys: ['classes', 'data']
  layers:
    layer softmax 'ctc' #: 10026
    layer source 'data' #: 40
    layer decide 'decision' #: 10025
    layer linear 'enc_ctx' #: 1024
    layer split_dims 'enc_value' #: 2048
    layer copy 'encoder' #: 2048
    layer linear 'inv_fertility' #: 1
    layer rec 'lstm0_bw' #: 1024
    layer rec 'lstm0_fw' #: 1024
    layer pool 'lstm0_pool' #: 2048
    layer rec 'lstm5_bw' #: 1024
    layer rec 'lstm5_fw' #: 1024
    layer rec 'output' #: 10025
    layer eval 'source' #: 40
net params #: 87166092
net trainable params: [<tf.Variable 'ctc/W:0' shape=(2048, 10026) dtype=float32_ref>, <tf.Variable 'ctc/b:0' shape=(10026,) dtype=float32_ref>, <tf.Variable 'enc_ctx/W:0' shape=(2048, 1024) dtype=float32_ref>, <tf.Variable 'enc_ctx/b:0' shape=(1024,) dtype=float32_ref>, <tf.Variable 'inv_fertility/W:0' shape=(2048, 1) dtype=float32_ref>, <tf.Variable 'lstm0_bw/rec/W:0' shape=(40, 4096) dtype=float32_ref>, <tf.Variable 'lstm0_bw/rec/W_re:0' shape=(1024, 4096) dtype=float32_ref>, <tf.Variable 'lstm0_bw/rec/b:0' shape=(4096,) dtype=float32_ref>, <tf.Variable 'lstm0_fw/rec/W:0' shape=(40, 4096) dtype=float32_ref>, <tf.Variable 'lstm0_fw/rec/W_re:0' shape=(1024, 4096) dtype=float32_ref>, <tf.Variable 'lstm0_fw/rec/b:0' shape=(4096,) dtype=float32_ref>, <tf.Variable 'lstm5_bw/rec/W:0' shape=(2048, 4096) dtype=float32_ref>, <tf.Variable 'lstm5_bw/rec/W_re:0' shape=(1024, 4096) dtype=float32_ref>, <tf.Variable 'lstm5_bw/rec/b:0' shape=(4096,) dtype=float32_ref>, <tf.Variable 'lstm5_fw/rec/W:0' shape=(2048, 4096) dtype=float32_ref>, <tf.Variable 'lstm5_fw/rec/W_re:0' shape=(1024, 4096) dtype=float32_ref>, <tf.Variable 'lstm5_fw/rec/b:0' shape=(4096,) dtype=float32_ref>, <tf.Variable 'output/rec/energy/W:0' shape=(1024, 1) dtype=float32_ref>, <tf.Variable 'output/rec/output_prob/W:0' shape=(500, 10025) dtype=float32_ref>, <tf.Variable 'output/rec/output_prob/b:0' shape=(10025,) dtype=float32_ref>, <tf.Variable 'output/rec/readout_in/W:0' shape=(3669, 1000) dtype=float32_ref>, <tf.Variable 'output/rec/readout_in/b:0' shape=(1000,) dtype=float32_ref>, <tf.Variable 'output/rec/s/rec/lstm_cell/bias:0' shape=(4000,) dtype=float32_ref>, <tf.Variable 'output/rec/s/rec/lstm_cell/kernel:0' shape=(3669, 4000) dtype=float32_ref>, <tf.Variable 'output/rec/s_transformed/W:0' shape=(1000, 1024) dtype=float32_ref>, <tf.Variable 'output/rec/target_embed/W:0' shape=(10025, 621) dtype=float32_ref>, <tf.Variable 'output/rec/weight_feedback/W:0' shape=(1, 1024) dtype=float32_ref>]