manish-kumar-garg/asr_local_attention_error.txt

## asr_local_attention_error.txt
RETURNN starting up, version 20191217.234858--git-09b41c6f-dirty, date/time 2019-12-26-06-45-58 (UTC+0000), pid 20215, cwd /home/ubuntu/rwth-i6/returnn-experiments/2018-asr-attention/librispeech/full-setup-attention, Python /home/ubuntu/tf1.13/bin/python3
RETURNN command line options: ['local-heuristic.argmax.win05.exp3.ctc.config']
Hostname: ip-10-1-21-241
/home/ubuntu/tf1.13/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/ubuntu/tf1.13/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/ubuntu/tf1.13/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/ubuntu/tf1.13/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/ubuntu/tf1.13/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/ubuntu/tf1.13/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
TensorFlow: 1.13.1 (b'v1.13.1-0-g6612da8951') (<site-package> in /home/ubuntu/tf1.13/lib/python3.6/site-packages/tensorflow)
Setup TF inter and intra global thread pools, num_threads None, session opts {'log_device_placement': False, 'device_count': {'GPU': 0}}.
2019-12-26 06:45:59.202047: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-12-26 06:45:59.425681: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-12-26 06:45:59.429258: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-12-26 06:45:59.430831: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55e4d8dd5850 executing computations on platform CUDA. Devices:
2019-12-26 06:45:59.430864: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla K80, Compute Capability 3.7
2019-12-26 06:45:59.430881: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla K80, Compute Capability 3.7
2019-12-26 06:45:59.450542: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300090000 Hz
2019-12-26 06:45:59.452625: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55e4d9498d60 executing computations on platform Host. Devices:
2019-12-26 06:45:59.452659: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-12-26 06:45:59.452768: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-26 06:45:59.452791: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]
CUDA_VISIBLE_DEVICES is set to '1,2'.
Collecting TensorFlow device list...
2019-12-26 06:45:59.455892: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:18.0
totalMemory: 11.17GiB freeMemory: 446.06MiB
2019-12-26 06:45:59.456040: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 1 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:19.0
totalMemory: 11.17GiB freeMemory: 446.06MiB
2019-12-26 06:45:59.456482: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0, 1
2019-12-26 06:45:59.459962: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-26 06:45:59.459989: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 1
2019-12-26 06:45:59.460007: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N Y
2019-12-26 06:45:59.460016: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1:   Y N
2019-12-26 06:45:59.460292: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 221 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:18.0, compute capability: 3.7)
2019-12-26 06:45:59.463035: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:1 with 221 MB memory) -> physical GPU (device: 1, name: Tesla K80, pci bus id: 0000:00:19.0, compute capability: 3.7)
Local devices available to TensorFlow:
  1/6: name: "/device:CPU:0"
       device_type: "CPU"
       memory_limit: 268435456
       locality {
       }
       incarnation: 13382603103062640486
  2/6: name: "/device:XLA_GPU:0"
       device_type: "XLA_GPU"
       memory_limit: 17179869184
       locality {
       }
       incarnation: 13903808377023195374
       physical_device_desc: "device: XLA_GPU device"
  3/6: name: "/device:XLA_GPU:1"
       device_type: "XLA_GPU"
       memory_limit: 17179869184
       locality {
       }
       incarnation: 10467231846590301634
       physical_device_desc: "device: XLA_GPU device"
  4/6: name: "/device:XLA_CPU:0"
       device_type: "XLA_CPU"
       memory_limit: 17179869184
       locality {
       }
       incarnation: 2300646191363693310
       physical_device_desc: "device: XLA_CPU device"
  5/6: name: "/device:GPU:0"
       device_type: "GPU"
       memory_limit: 231800832
       locality {
         bus_id: 1
         links {
           link {
             device_id: 1
             type: "StreamExecutor"
             strength: 1
           }
         }
       }
       incarnation: 15532672733655798053
       physical_device_desc: "device: 0, name: Tesla K80, pci bus id: 0000:00:18.0, compute capability: 3.7"
  6/6: name: "/device:GPU:1"
       device_type: "GPU"
       memory_limit: 231800832
       locality {
         bus_id: 1
         links {
           link {
             type: "StreamExecutor"
             strength: 1
           }
         }
       }
       incarnation: 4180441292150800506
       physical_device_desc: "device: 1, name: Tesla K80, pci bus id: 0000:00:19.0, compute capability: 3.7"
Using gpu device 1: Tesla K80
Using gpu device 2: Tesla K80
<LibriSpeechCorpus 'train' epoch=1>, epoch 1. Old mean seq len (transcription) is 183.267376, new is 63.708029, requested max is 75.000000. Old num seqs is 6575, new num seqs is 822.
<LibriSpeechCorpus 'train' epoch=1>, epoch 1. Old num seqs 14063, new num seqs 822.
<LibriSpeechCorpus 'train' epoch=1>, epoch 1. Old mean seq len (transcription) is 183.267376, new is 63.708029, requested max is 75.000000. Old num seqs is 6575, new num seqs is 822.
<LibriSpeechCorpus 'train' epoch=1>, epoch 1. Old num seqs 14063, new num seqs 822.
Train data:
  input: 40 x 1
  output: {'classes': [10025, 1], 'raw': {'dtype': 'string', 'shape': ()}, 'data': [40, 2]}
  LibriSpeechCorpus, sequences: 822, frames: unknown
Dev data:
  LibriSpeechCorpus, sequences: 3000, frames: unknown
Learning-rate-control: file newbob.data does not exist yet
Update config key 'max_seq_length' for epoch 1: {'classes': 75} -> {'classes': 60}
Setup tf.Session with options {'log_device_placement': False, 'device_count': {'GPU': 1}} ...
2019-12-26 06:46:05.591177: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0, 1
2019-12-26 06:46:05.591272: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-26 06:46:05.591293: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 1
2019-12-26 06:46:05.591310: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N Y
2019-12-26 06:46:05.591325: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1:   Y N
2019-12-26 06:46:05.591489: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 221 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:18.0, compute capability: 3.7)
layer root/'data' output: Data(name='data', shape=(None, 40), batch_shape_meta=[B,T|'time:var:extern_data:data',F|40])
layer root/'source' output: Data(name='source_output', shape=(None, 40), batch_shape_meta=[B,T|'time:var:extern_data:data',F|40])
layer root/'lstm0_fw' output: Data(name='lstm0_fw_output', shape=(None, 1024), batch_dim_axis=1, batch_shape_meta=[T|'time:var:extern_data:data',B,F|1024])
layer root/'lstm0_bw' output: Data(name='lstm0_bw_output', shape=(None, 1024), batch_dim_axis=1, batch_shape_meta=[T|'time:var:extern_data:data',B,F|1024])
layer root/'lstm0_pool' output: Data(name='lstm0_pool_output', shape=(None, 2048), batch_shape_meta=[B,T|?,F|2048])
layer root/'lstm5_fw' output: Data(name='lstm5_fw_output', shape=(None, 1024), batch_dim_axis=1, batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|1024])
layer root/'lstm5_bw' output: Data(name='lstm5_bw_output', shape=(None, 1024), batch_dim_axis=1, batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|1024])
layer root/'encoder' output: Data(name='encoder_output', shape=(None, 2048), batch_dim_axis=1, batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|2048])
layer root/'ctc' output: Data(name='ctc_output', shape=(None, 10026), batch_dim_axis=1, batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|10026])
layer root/'enc_ctx' output: Data(name='enc_ctx_output', shape=(None, 1024), batch_dim_axis=1, batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|1024])
layer root/'inv_fertility' output: Data(name='inv_fertility_output', shape=(None, 1), batch_dim_axis=1, batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|1])
layer root/'enc_value' output: Data(name='enc_value_output', shape=(None, 1, 2048), batch_dim_axis=1, batch_shape_meta=[T|'spatial:0:lstm0_pool',B,1,F|2048])
<_SubnetworkRecCell of None>: exception constructing template network (for deps and data shapes)
Most recent construction stack:
<_TemplateLayer(EvalLayer)(:template:eval) 'output/p_t_in' out_type=Data(shape=(), time_dim_axis=None, batch_shape_meta=[B]) (construction stack 'att_weights')>, kwargs:
{'eval': 'tf.squeeze(tf.argmax(source(0), axis=1, output_type=tf.int32), '
         'axis=1)',
 'name': 'p_t_in',
 'network': <TFNetwork 'root/output:rec-subnet' parent_net=<TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>,
 'out_type': {'batch_dim_axis': 0, 'dtype': 'float32', 'shape': ()},
 'sources': [<_TemplateLayer(SoftmaxOverSpatialLayer)(:prev:softmax_over_spatial) 'output/prev:att_weights' out_type=Data(shape=(1, None), time_dim_axis=2, feature_dim_axis=1, batch_shape_meta=[B,F|1,T|'spatial:0:lstm0_pool']) (construction stack None)>]}
Template network so far:
{'accum_att_weights': <_TemplateLayer(EvalLayer)(:template:eval) 'output/accum_att_weights' out_type=Data(shape=(None, 1), batch_dim_axis=1, batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|1]) (construction stack 'weight_feedback')>,
 'att': <_TemplateLayer(MergeDimsLayer)(:template:merge_dims) 'output/att' out_type=Data(shape=(2048,), time_dim_axis=None, batch_shape_meta=[B,F|2048]) (construction stack 's')>,
 'att0': <_TemplateLayer(GenericAttentionLayer)(:template:generic_attention) 'output/att0' out_type=Data(shape=(1, 2048), time_dim_axis=None, batch_shape_meta=[B,1,F|2048]) (construction stack 'att')>,
 'att_weights': <_TemplateLayer(SoftmaxOverSpatialLayer)(:template:softmax_over_spatial) 'output/att_weights' out_type=Data(shape=(1, None), time_dim_axis=2, feature_dim_axis=1, batch_shape_meta=[B,F|1,T|'spatial:0:lstm0_pool']) (construction stack 'att0')>,
 'end': <_TemplateLayer(CompareLayer)(:template:compare) 'output/end' out_type=Data(shape=(), dtype='bool', sparse=True, dim=2, time_dim_axis=None, batch_shape_meta=[B]) (construction stack None)>,
 'energy': <_TemplateLayer(LinearLayer)(:template:linear) 'output/energy' out_type=Data(shape=(None, 1), batch_dim_axis=1, batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|1]) (construction stack 'energy_reinterpreted')>,
 'energy_in': <_TemplateLayer(CombineLayer)(:template:combine) 'output/energy_in' out_type=Data(shape=(None, 1024), batch_dim_axis=1, batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|1024]) (construction stack 'energy_tanh')>,
 'energy_reinterpreted': <_TemplateLayer(ReinterpretDataLayer)(:template:reinterpret_data) 'output/energy_reinterpreted' out_type=Data(shape=(None, 1), batch_shape_meta=[B,T|'spatial:0:lstm0_pool',F|1]) (construction stack 'att_weights')>,
 'energy_tanh': <_TemplateLayer(ActivationLayer)(:template:activation) 'output/energy_tanh' out_type=Data(shape=(None, 1024), batch_dim_axis=1, batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|1024]) (construction stack 'energy')>,
 'output': <_TemplateLayer(ChoiceLayer)(:template:choice) 'output/output' out_type=Data(shape=(), dtype='int32', sparse=True, dim=10025, time_dim_axis=None, batch_shape_meta=[B]) (construction stack None)>,
 'output_prob': <_TemplateLayer(SoftmaxLayer)(:template:softmax) 'output/output_prob' out_type=Data(shape=(10025,), time_dim_axis=None, batch_shape_meta=[B,F|10025]) (construction stack None)>,
 'p_t_in': <_TemplateLayer(EvalLayer)(:template:eval) 'output/p_t_in' out_type=Data(shape=(), time_dim_axis=None, batch_shape_meta=[B]) (construction stack 'att_weights')>,
 'readout': <_TemplateLayer(ReduceOutLayer)(:template:reduce_out) 'output/readout' out_type=Data(shape=(500,), time_dim_axis=None, batch_shape_meta=[B,F|500]) (construction stack 'output_prob')>,
 'readout_in': <_TemplateLayer(LinearLayer)(:template:linear) 'output/readout_in' out_type=Data(shape=(1000,), time_dim_axis=None, batch_shape_meta=[B,F|1000]) (construction stack 'readout')>,
 's': <_TemplateLayer(RnnCellLayer)(:template:rnn_cell) 'output/s' out_type=Data(shape=(1000,), time_dim_axis=None, batch_shape_meta=[B,F|1000]) (construction stack 'readout_in')>,
 's_transformed': <_TemplateLayer(LinearLayer)(:template:linear) 'output/s_transformed' out_type=Data(shape=(1024,), time_dim_axis=None, batch_shape_meta=[B,F|1024]) (construction stack 'energy_in')>,
 'target_embed': <_TemplateLayer(LinearLayer)(:template:linear) 'output/target_embed' out_type=Data(shape=(621,), time_dim_axis=None, batch_shape_meta=[B,F|621]) (construction stack 's')>,
 'weight_feedback': <_TemplateLayer(LinearLayer)(:template:linear) 'output/weight_feedback' out_type=Data(shape=(None, 1024), batch_dim_axis=1, batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|1024]) (construction stack 'energy_in')>}
Collected (unique) exceptions during template construction:
(Note that many of these can be ignored, or are expected.)
EXCEPTION
NetworkConstructionDependencyLoopException: Error: There is a dependency loop on layer 'accum_att_weights'.
Construction stack (most recent first):
  accum_att_weights
  weight_feedback
  energy_in
  energy_tanh
  energy
  energy_reinterpreted
  att_weights
  att0
  att
  s
  readout_in
  readout
  output_prob

EXCEPTION
CannotHandleUndefinedSourcesException: 's_transformed': cannot handle undefined sources without defined out_type.
{'activation': None,
 'loss': None,
 'n_out': 1024,
 'network': <TFNetwork 'root/output:rec-subnet' parent_net=<TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>,
 'size_target': None,
 'sources': [None],
 'target': None,
 'with_bias': False}

Exception creating layer root/'output' of class RecLayer with opts:
{'_target_layers': {},
 'cheating': False,
 'max_seq_len': <tf.Tensor 'max_seq_len_encoder:0' shape=() dtype=int32>,
 'n_out': <class 'Util.NotSpecified'>,
 'name': 'output',
 'network': <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>,
 'sources': [],
 'target': 'classes',
 'unit': {'accum_att_weights': {'class': 'eval',
                                'eval': 'source(0) + source(1) * source(2) * '
                                        '0.5',
                                'from': ['prev:accum_att_weights',
                                         'att_weights',
                                         'base:inv_fertility'],
                                'out_type': {'dim': 1, 'shape': (None, 1)}},
          'att': {'axes': 'except_batch',
                  'class': 'merge_dims',
                  'from': ['att0']},
          'att0': {'base': 'base:enc_value',
                   'class': 'generic_attention',
                   'weights': 'att_weights'},
          'att_weights': {'class': 'softmax_over_spatial',
                          'from': ['energy_reinterpreted'],
                          'window_size': 5,
                          'window_start': 'p_t_in'},
          'end': {'class': 'compare', 'from': ['output'], 'value': 0},
          'energy': {'activation': None,
                     'class': 'linear',
                     'from': ['energy_tanh'],
                     'n_out': 1,
                     'with_bias': False},
          'energy_in': {'class': 'combine',
                        'from': ['base:enc_ctx',
                                 'weight_feedback',
                                 's_transformed'],
                        'kind': 'add',
                        'n_out': 1024},
          'energy_reinterpreted': {'class': 'reinterpret_data',
                                   'enforce_batch_major': True,
                                   'from': 'energy',
                                   'trainable': False},
          'energy_tanh': {'activation': 'tanh',
                          'class': 'activation',
                          'from': ['energy_in']},
          'output': {'beam_size': 12,
                     'cheating': False,
                     'class': 'choice',
                     'from': ['output_prob'],
                     'initial_output': 0,
                     'target': 'classes'},
          'output_prob': {'class': 'softmax',
                          'dropout': 0.3,
                          'from': ['readout'],
                          'loss': 'ce',
                          'loss_only_on_non_search': True,
                          'loss_opts': {'label_smoothing': 0},
                          'target': 'classes'},
          'p_t': {'class': 'eval',
                  'eval': 'tf.to_float(source(0))',
                  'from': 'p_t_in'},
          'p_t_in': {'class': 'eval',
                     'eval': 'tf.squeeze(tf.argmax(source(0), axis=1, '
                             'output_type=tf.int32), axis=1)',
                     'from': 'prev:att_weights',
                     'out_type': {'batch_dim_axis': 0,
                                  'dtype': 'float32',
                                  'shape': ()}},
          'readout': {'class': 'reduce_out',
                      'from': ['readout_in'],
                      'mode': 'max',
                      'num_pieces': 2},
          'readout_in': {'activation': None,
                         'class': 'linear',
                         'from': ['s', 'prev:target_embed', 'att'],
                         'n_out': 1000},
          's': {'class': 'rnn_cell',
                'from': ['prev:target_embed', 'prev:att'],
                'n_out': 1000,
                'unit': 'LSTMBlock'},
          's_transformed': {'activation': None,
                            'class': 'linear',
                            'from': ['s'],
                            'n_out': 1024,
                            'with_bias': False},
          'target_embed': {'activation': None,
                           'class': 'linear',
                           'from': ['output'],
                           'initial_output': 0,
                           'n_out': 621,
                           'with_bias': False},
          'weight_feedback': {'activation': None,
                              'class': 'linear',
                              'from': ['prev:accum_att_weights'],
                              'n_out': 1024,
                              'with_bias': False}}}
Unhandled exception <class 'AssertionError'> in thread <_MainThread(MainThread, started 140613301045056)>, proc 20215.

Thread current, main, <_MainThread(MainThread, started 140613301045056)>:
(Excluded thread.)

That were all threads.
EXCEPTION
Traceback (most recent call last):
  File "./returnn/rnn.py", line 654, in <module>
    line: main(sys.argv)
    locals:
      main = <local> <function main at 0x7fe30c132158>
      sys = <local> <module 'sys' (built-in)>
      sys.argv = <local> ['./returnn/rnn.py', 'local-heuristic.argmax.win05.exp3.ctc.config'], _[0]: {len = 16}
  File "./returnn/rnn.py", line 642, in main
    line: execute_main_task()
    locals:
      execute_main_task = <global> <function execute_main_task at 0x7fe30c132048>
  File "./returnn/rnn.py", line 451, in execute_main_task
    line: engine.init_train_from_config(config, train_data, dev_data, eval_data)
    locals:
      engine = <global> <TFEngine.Engine object at 0x7fe04dbe8748>
      engine.init_train_from_config = <global> <bound method Engine.init_train_from_config of <TFEngine.Engine object at 0x7fe04dbe8748>>
      config = <global> <Config.Config object at 0x7fe3147c69e8>
      train_data = <global> <LibriSpeechCorpus 'train' epoch=1>
      dev_data = <global> <LibriSpeechCorpus 'dev' epoch=1>
      eval_data = <global> None
  File "/home/ubuntu/rwth-i6/returnn-experiments/2018-asr-attention/librispeech/full-setup-attention/returnn/TFEngine.py", line 891, in init_train_from_config
    line: self.init_network_from_config(config)
    locals:
      self = <local> <TFEngine.Engine object at 0x7fe04dbe8748>
      self.init_network_from_config = <local> <bound method Engine.init_network_from_config of <TFEngine.Engine object at 0x7fe04dbe8748>>
      config = <local> <Config.Config object at 0x7fe3147c69e8>
  File "/home/ubuntu/rwth-i6/returnn-experiments/2018-asr-attention/librispeech/full-setup-attention/returnn/TFEngine.py", line 934, in init_network_from_config
    line: self._init_network(net_desc=net_dict, epoch=self.epoch)
    locals:
      self = <local> <TFEngine.Engine object at 0x7fe04dbe8748>
      self._init_network = <local> <bound method Engine._init_network of <TFEngine.Engine object at 0x7fe04dbe8748>>
      net_desc = <not found>
      net_dict = <local> {'source': {'class': 'eval', 'eval': 'tf.clip_by_value(source(0), -3.0, 3.0)'}, 'lstm0_fw': {'class': 'rec', 'unit': 'nativelstm2', 'n_out': 1024, 'direction': 1, 'from': ['source']}, 'lstm0_bw': {'class': 'rec', 'unit': 'nativelstm2', 'n_out': 1024, 'direction': -1, 'from': ['source']}, 'lstm0_p..., len = 14
      epoch = <local> None
      self.epoch = <local> 1
  File "/home/ubuntu/rwth-i6/returnn-experiments/2018-asr-attention/librispeech/full-setup-attention/returnn/TFEngine.py", line 1081, in _init_network
    line: self.network, self.updater = self.create_network(
            config=self.config,
            rnd_seed=net_random_seed,
            train_flag=train_flag, eval_flag=self.use_eval_flag, search_flag=self.use_search_flag,
            initial_learning_rate=getattr(self, "initial_learning_rate", None),
            net_dict=net_desc)
    locals:
      self = <local> <TFEngine.Engine object at 0x7fe04dbe8748>
      self.network = <local> None
      self.updater = <local> None
      self.create_network = <local> <bound method Engine.create_network of <class 'TFEngine.Engine'>>
      config = <not found>
      self.config = <local> <Config.Config object at 0x7fe3147c69e8>
      rnd_seed = <not found>
      net_random_seed = <local> 1
      train_flag = <local> <tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>
      eval_flag = <not found>
      self.use_eval_flag = <local> True
      search_flag = <not found>
      self.use_search_flag = <local> False
      initial_learning_rate = <not found>
      getattr = <builtin> <built-in function getattr>
      net_dict = <not found>
      net_desc = <local> {'source': {'class': 'eval', 'eval': 'tf.clip_by_value(source(0), -3.0, 3.0)'}, 'lstm0_fw': {'class': 'rec', 'unit': 'nativelstm2', 'n_out': 1024, 'direction': 1, 'from': ['source']}, 'lstm0_bw': {'class': 'rec', 'unit': 'nativelstm2', 'n_out': 1024, 'direction': -1, 'from': ['source']}, 'lstm0_p..., len = 14
  File "/home/ubuntu/rwth-i6/returnn-experiments/2018-asr-attention/librispeech/full-setup-attention/returnn/TFEngine.py", line 1113, in create_network
    line: network.construct_from_dict(net_dict)
    locals:
      network = <local> <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
      network.construct_from_dict = <local> <bound method TFNetwork.construct_from_dict of <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
      net_dict = <local> {'source': {'class': 'eval', 'eval': 'tf.clip_by_value(source(0), -3.0, 3.0)'}, 'lstm0_fw': {'class': 'rec', 'unit': 'nativelstm2', 'n_out': 1024, 'direction': 1, 'from': ['source']}, 'lstm0_bw': {'class': 'rec', 'unit': 'nativelstm2', 'n_out': 1024, 'direction': -1, 'from': ['source']}, 'lstm0_p..., len = 14
  File "/home/ubuntu/rwth-i6/returnn-experiments/2018-asr-attention/librispeech/full-setup-attention/returnn/TFNetwork.py", line 460, in construct_from_dict
    line: self.construct_layer(net_dict, name)
    locals:
      self = <local> <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
      self.construct_layer = <local> <bound method TFNetwork.construct_layer of <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
      net_dict = <local> {'source': {'class': 'eval', 'eval': 'tf.clip_by_value(source(0), -3.0, 3.0)'}, 'lstm0_fw': {'class': 'rec', 'unit': 'nativelstm2', 'n_out': 1024, 'direction': 1, 'from': ['source']}, 'lstm0_bw': {'class': 'rec', 'unit': 'nativelstm2', 'n_out': 1024, 'direction': -1, 'from': ['source']}, 'lstm0_p..., len = 14
      name = <local> 'decision', len = 8
  File "/home/ubuntu/rwth-i6/returnn-experiments/2018-asr-attention/librispeech/full-setup-attention/returnn/TFNetwork.py", line 652, in construct_layer
    line: layer_class.transform_config_dict(layer_desc, network=self, get_layer=get_layer)
    locals:
      layer_class = <local> <class 'TFNetworkRecLayer.DecideLayer'>
      layer_class.transform_config_dict = <local> <bound method BaseChoiceLayer.transform_config_dict of <class 'TFNetworkRecLayer.DecideLayer'>>
      layer_desc = <local> {'loss': 'edit_distance', 'target': 'classes', 'loss_only_on_non_search': False}
      network = <not found>
      self = <local> <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
      get_layer = <local> <function TFNetwork.construct_layer.<locals>.get_layer at 0x7fe3147289d8>
  File "/home/ubuntu/rwth-i6/returnn-experiments/2018-asr-attention/librispeech/full-setup-attention/returnn/TFNetworkRecLayer.py", line 4089, in transform_config_dict
    line: super(BaseChoiceLayer, cls).transform_config_dict(d, network=network, get_layer=get_layer)
    locals:
      super = <builtin> <class 'super'>
      BaseChoiceLayer = <global> <class 'TFNetworkRecLayer.BaseChoiceLayer'>
      cls = <local> <class 'TFNetworkRecLayer.DecideLayer'>
      transform_config_dict = <not found>
      d = <local> {'loss': 'edit_distance', 'target': 'classes', 'loss_only_on_non_search': False}
      network = <local> <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
      get_layer = <local> <function TFNetwork.construct_layer.<locals>.get_layer at 0x7fe3147289d8>
  File "/home/ubuntu/rwth-i6/returnn-experiments/2018-asr-attention/librispeech/full-setup-attention/returnn/TFNetworkLayer.py", line 448, in transform_config_dict
    line: for src_name in src_names
    locals:
      src_name = <not found>
      src_names = <local> ['output'], _[0]: {len = 6}
  File "/home/ubuntu/rwth-i6/returnn-experiments/2018-asr-attention/librispeech/full-setup-attention/returnn/TFNetworkLayer.py", line 449, in <listcomp>
    line: d["sources"] = [
            get_layer(src_name)
            for src_name in src_names
            if not src_name == "none"]
    locals:
      d = <not found>
      get_layer = <local> <function TFNetwork.construct_layer.<locals>.get_layer at 0x7fe3147289d8>
      src_name = <local> 'output', len = 6
      src_names = <not found>
  File "/home/ubuntu/rwth-i6/returnn-experiments/2018-asr-attention/librispeech/full-setup-attention/returnn/TFNetwork.py", line 607, in get_layer
    line: return self.construct_layer(net_dict=net_dict, name=src_name)  # set get_layer to wrap construct_layer
    locals:
      self = <local> <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
      self.construct_layer = <local> <bound method TFNetwork.construct_layer of <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
      net_dict = <local> {'source': {'class': 'eval', 'eval': 'tf.clip_by_value(source(0), -3.0, 3.0)'}, 'lstm0_fw': {'class': 'rec', 'unit': 'nativelstm2', 'n_out': 1024, 'direction': 1, 'from': ['source']}, 'lstm0_bw': {'class': 'rec', 'unit': 'nativelstm2', 'n_out': 1024, 'direction': -1, 'from': ['source']}, 'lstm0_p..., len = 14
      name = <not found>
      src_name = <local> 'output', len = 6
  File "/home/ubuntu/rwth-i6/returnn-experiments/2018-asr-attention/librispeech/full-setup-attention/returnn/TFNetwork.py", line 655, in construct_layer
    line: return add_layer(name=name, layer_class=layer_class, **layer_desc)
    locals:
      add_layer = <local> <bound method TFNetwork.add_layer of <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
      name = <local> 'output', len = 6
      layer_class = <local> <class 'TFNetworkRecLayer.RecLayer'>
      layer_desc = <local> {'cheating': False, 'unit': {'output': {'class': 'choice', 'target': 'classes', 'beam_size': 12, 'cheating': False, 'from': ['output_prob'], 'initial_output': 0}, 'end': {'class': 'compare', 'from': ['output'], 'value': 0}, 'target_embed': {'class': 'linear', 'activation': None, 'with_bias': Fals..., len = 7
  File "/home/ubuntu/rwth-i6/returnn-experiments/2018-asr-attention/librispeech/full-setup-attention/returnn/TFNetwork.py", line 760, in add_layer
    line: layer = self._create_layer(name=name, layer_class=layer_class, **layer_desc)
    locals:
      layer = <not found>
      self = <local> <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
      self._create_layer = <local> <bound method TFNetwork._create_layer of <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
      name = <local> 'output', len = 6
      layer_class = <local> <class 'TFNetworkRecLayer.RecLayer'>
      layer_desc = <local> {'cheating': False, 'unit': {'output': {'class': 'choice', 'target': 'classes', 'beam_size': 12, 'cheating': False, 'from': ['output_prob'], 'initial_output': 0}, 'end': {'class': 'compare', 'from': ['output'], 'value': 0}, 'target_embed': {'class': 'linear', 'activation': None, 'with_bias': Fals..., len = 7
  File "/home/ubuntu/rwth-i6/returnn-experiments/2018-asr-attention/librispeech/full-setup-attention/returnn/TFNetwork.py", line 701, in _create_layer
    line: layer_desc["output"] = layer_class.get_out_data_from_opts(**layer_desc)
    locals:
      layer_desc = <local> {'cheating': False, 'unit': {'output': {'class': 'choice', 'target': 'classes', 'beam_size': 12, 'cheating': False, 'from': ['output_prob'], 'initial_output': 0}, 'end': {'class': 'compare', 'from': ['output'], 'value': 0}, 'target_embed': {'class': 'linear', 'activation': None, 'with_bias': Fals..., len = 9
      layer_class = <local> <class 'TFNetworkRecLayer.RecLayer'>
      layer_class.get_out_data_from_opts = <local> <bound method RecLayer.get_out_data_from_opts of <class 'TFNetworkRecLayer.RecLayer'>>
  File "/home/ubuntu/rwth-i6/returnn-experiments/2018-asr-attention/librispeech/full-setup-attention/returnn/TFNetworkRecLayer.py", line 362, in get_out_data_from_opts
    line: subnet = _SubnetworkRecCell(
            parent_net=kwargs["network"], net_dict=unit, source_data=source_data, rec_layer_name=kwargs["name"])
    locals:
      subnet = <not found>
      _SubnetworkRecCell = <global> <class 'TFNetworkRecLayer._SubnetworkRecCell'>
      parent_net = <not found>
      kwargs = <local> {'cheating': False, 'target': 'classes', 'max_seq_len': <tf.Tensor 'max_seq_len_encoder:0' shape=() dtype=int32>, 'n_out': <class 'Util.NotSpecified'>, '_target_layers': {}, 'name': 'output', 'network': <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>}, len = 7
      net_dict = <not found>
      unit = <local> {'output': {'class': 'choice', 'target': 'classes', 'beam_size': 12, 'cheating': False, 'from': ['output_prob'], 'initial_output': 0}, 'end': {'class': 'compare', 'from': ['output'], 'value': 0}, 'target_embed': {'class': 'linear', 'activation': None, 'with_bias': False, 'from': ['output'], 'n_ou..., len = 19
      source_data = <local> None
      rec_layer_name = <not found>
  File "/home/ubuntu/rwth-i6/returnn-experiments/2018-asr-attention/librispeech/full-setup-attention/returnn/TFNetworkRecLayer.py", line 917, in __init__
    line: self._construct_template()
    locals:
      self = <local> <_SubnetworkRecCell of None>
      self._construct_template = <local> <bound method _SubnetworkRecCell._construct_template of <_SubnetworkRecCell of None>>
  File "/home/ubuntu/rwth-i6/returnn-experiments/2018-asr-attention/librispeech/full-setup-attention/returnn/TFNetworkRecLayer.py", line 1229, in _construct_template
    line: direct_get_layer.construct(layer.name)
    locals:
      direct_get_layer = <local> <RecLayer construct template GetLayer>(safe False, allow_construct_in_call_nrs None, allow_uninitialized_template False, count 1, parents None)
      direct_get_layer.construct = <local> <bound method _SubnetworkRecCell._construct_template.<locals>.GetLayer.construct of <RecLayer construct template GetLayer>(safe False, allow_construct_in_call_nrs None, allow_uninitialized_template False, count 1, parents None)>
      layer = <local> <_TemplateLayer(EvalLayer)(:template:eval) 'output/p_t_in' out_type=Data(shape=(), time_dim_axis=None, batch_shape_meta=[B]) (construction stack 'att_weights')>
      layer.name = <local> 'p_t_in', len = 6
  File "/home/ubuntu/rwth-i6/returnn-experiments/2018-asr-attention/librispeech/full-setup-attention/returnn/TFNetworkRecLayer.py", line 1021, in construct
    line: self.__call__(layer_name_)
    locals:
      self = <local> <RecLayer construct template GetLayer>(safe False, allow_construct_in_call_nrs None, allow_uninitialized_template False, count 1, parents None)
      self.__call__ = <local> <bound method _SubnetworkRecCell._construct_template.<locals>.GetLayer.__call__ of <RecLayer construct template GetLayer>(safe False, allow_construct_in_call_nrs None, allow_uninitialized_template False, count 1, parents None)>
      layer_name_ = <local> 'p_t_in', len = 6
  File "/home/ubuntu/rwth-i6/returnn-experiments/2018-asr-attention/librispeech/full-setup-attention/returnn/TFNetworkRecLayer.py", line 1188, in __call__
    line: self.net.construct_layer(
            net_dict=self.net_dict, name=name,
            get_layer=default_get_layer, add_layer=default_get_layer.add_templated_layer)
    locals:
      self = <local> <_SubnetworkRecCell of None>
      self.net = <local> <TFNetwork 'root/output:rec-subnet' parent_net=<TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
      self.net.construct_layer = <local> <bound method TFNetwork.construct_layer of <TFNetwork 'root/output:rec-subnet' parent_net=<TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
      net_dict = <not found>
      self.net_dict = <local> {'output': {'class': 'choice', 'target': 'classes', 'beam_size': 12, 'cheating': False, 'from': ['output_prob'], 'initial_output': 0}, 'end': {'class': 'compare', 'from': ['output'], 'value': 0}, 'target_embed': {'class': 'linear', 'activation': None, 'with_bias': False, 'from': ['output'], 'n_ou..., len = 19
      name = <local> 'p_t_in', len = 6
      get_layer = <not found>
      default_get_layer = <local> <RecLayer construct template GetLayer>(safe False, allow_construct_in_call_nrs None, allow_uninitialized_template False, count 0, parents 'p_t_in')
      add_layer = <not found>
      default_get_layer.add_templated_layer = <local> <bound method _SubnetworkRecCell._construct_template.<locals>.GetLayer.add_templated_layer of <RecLayer construct template GetLayer>(safe False, allow_construct_in_call_nrs None, allow_uninitialized_template False, count 0, parents 'p_t_in')>
  File "/home/ubuntu/rwth-i6/returnn-experiments/2018-asr-attention/librispeech/full-setup-attention/returnn/TFNetwork.py", line 655, in construct_layer
    line: return add_layer(name=name, layer_class=layer_class, **layer_desc)
    locals:
      add_layer = <local> <bound method _SubnetworkRecCell._construct_template.<locals>.GetLayer.add_templated_layer of <RecLayer construct template GetLayer>(safe False, allow_construct_in_call_nrs None, allow_uninitialized_template False, count 0, parents 'p_t_in')>
      name = <local> 'p_t_in', len = 6
      layer_class = <local> <class 'TFNetworkLayer.EvalLayer'>
      layer_desc = <local> {'eval': 'tf.squeeze(tf.argmax(source(0), axis=1, output_type=tf.int32), axis=1)', 'out_type': {'shape': (), 'batch_dim_axis': 0, 'dtype': 'float32'}, 'sources': [<_TemplateLayer(SoftmaxOverSpatialLayer)(:prev:softmax_over_spatial) 'output/prev:att_weights' out_type=Data(shape=(1, None), time_dim...
  File "/home/ubuntu/rwth-i6/returnn-experiments/2018-asr-attention/librispeech/full-setup-attention/returnn/TFNetworkRecLayer.py", line 1041, in add_templated_layer
    line: output = layer_class.get_out_data_from_opts(**layer_desc)
    locals:
      output = <not found>
      layer_class = <local> <class 'TFNetworkLayer.EvalLayer'>
      layer_class.get_out_data_from_opts = <local> <bound method CombineLayer.get_out_data_from_opts of <class 'TFNetworkLayer.EvalLayer'>>
      layer_desc = <local> {'eval': 'tf.squeeze(tf.argmax(source(0), axis=1, output_type=tf.int32), axis=1)', 'out_type': {'shape': (), 'batch_dim_axis': 0, 'dtype': 'float32'}, 'sources': [<_TemplateLayer(SoftmaxOverSpatialLayer)(:prev:softmax_over_spatial) 'output/prev:att_weights' out_type=Data(shape=(1, None), time_dim...
  File "/home/ubuntu/rwth-i6/returnn-experiments/2018-asr-attention/librispeech/full-setup-attention/returnn/TFNetworkLayer.py", line 6069, in get_out_data_from_opts
    line: return super(CombineLayer, cls).get_out_data_from_opts(n_out=n_out, out_type=out_type_, sources=sources, **kwargs)
    locals:
      super = <builtin> <class 'super'>
      CombineLayer = <global> <class 'TFNetworkLayer.CombineLayer'>
      cls = <local> <class 'TFNetworkLayer.EvalLayer'>
      get_out_data_from_opts = <not found>
      n_out = <local> <class 'Util.NotSpecified'>
      out_type = <local> {'shape': (), 'batch_dim_axis': 0, 'dtype': 'float32'}
      out_type_ = <local> {'name': 'p_t_in_output', 'shape': (), 'dtype': 'float32', 'sparse': False, 'dim': 1, 'batch_dim_axis': 0, 'time_dim_axis': 2, 'feature_dim_axis': 1}, len = 8
      sources = <local> [<_TemplateLayer(SoftmaxOverSpatialLayer)(:prev:softmax_over_spatial) 'output/prev:att_weights' out_type=Data(shape=(1, None), time_dim_axis=2, feature_dim_axis=1, batch_shape_meta=[B,F|1,T|'spatial:0:lstm0_pool']) (construction stack None)>]
      kwargs = <local> {'eval': 'tf.squeeze(tf.argmax(source(0), axis=1, output_type=tf.int32), axis=1)', 'name': 'p_t_in', 'network': <TFNetwork 'root/output:rec-subnet' parent_net=<TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=boo...
  File "/home/ubuntu/rwth-i6/returnn-experiments/2018-asr-attention/librispeech/full-setup-attention/returnn/TFNetworkLayer.py", line 227, in get_out_data_from_opts
    line: return cls._base_get_out_data_from_opts(**kwargs)
    locals:
      cls = <local> <class 'TFNetworkLayer.EvalLayer'>
      cls._base_get_out_data_from_opts = <local> <bound method LayerBase._base_get_out_data_from_opts of <class 'TFNetworkLayer.EvalLayer'>>
      kwargs = <local> {'n_out': <class 'Util.NotSpecified'>, 'out_type': {'name': 'p_t_in_output', 'shape': (), 'dtype': 'float32', 'sparse': False, 'dim': 1, 'batch_dim_axis': 0, 'time_dim_axis': 2, 'feature_dim_axis': 1}, 'sources': [<_TemplateLayer(SoftmaxOverSpatialLayer)(:prev:softmax_over_spatial) 'output/prev:a..., len = 6
  File "/home/ubuntu/rwth-i6/returnn-experiments/2018-asr-attention/librispeech/full-setup-attention/returnn/TFNetworkLayer.py", line 324, in _base_get_out_data_from_opts
    line: output = Data(**out_type)
    locals:
      output = <not found>
      Data = <global> <class 'TFUtil.Data'>
      out_type = <local> {'name': 'p_t_in_output', 'shape': (), 'dtype': 'float32', 'sparse': False, 'dim': 1, 'batch_dim_axis': 0, 'time_dim_axis': 2, 'feature_dim_axis': 1, 'beam': None}, len = 9
  File "/home/ubuntu/rwth-i6/returnn-experiments/2018-asr-attention/librispeech/full-setup-attention/returnn/TFUtil.py", line 559, in __init__
    line: assert 0 <= feature_dim_axis < self.batch_ndim
    locals:
      feature_dim_axis = <local> 1
      self = <local> !AttributeError: 'Data' object has no attribute 'time_dim_axis'
      self.batch_ndim = <local> 1
AssertionError