Created
January 29, 2018 15:43
-
-
Save AdamStelmaszczyk/20f2ac18e4621c437c1021719c27d368 to your computer and use it in GitHub Desktop.
InvalidArgumentError 2
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ python run_job.py -n 5 -g 60 -c 12 --use_sync --name neptune_job_name clear^C | |
[prometheus][plghenrykm@login01 src]$ vim OpenAIGym/train.py | |
[prometheus][plghenrykm@login01 src]$ python run_job.py -n 5 -g 60 -c 12 --use_sync --name neptune_job_name | |
args.offline: False | |
('bash command: ', 'srun -A luna -N 5 -n 5 -c 12 -t 6:00:00 distributed_tensorpack_mkl.sh 17351 9236 Breakout-v0 adam 1 "3nodes 12cores" "neptune_job_name_1517240471.8" 0.00015 128 60 0 None 256 100 1 uniform normal False . True 1 False /net/archive/groups/plggluna/intel_2/logs/ 1e-08 0.9 0.999 0 False False False False 120 False') | |
SLURM_JOB_ID 9495449 ; SLURM_JOB_NAME distributed_tensorpack_mkl.sh ; SLURM_JOB_NODELIST p[1567-1568,1577,1580,1584] ; SLURMD_NODENAME p1584 ; SLURM_JOB_NUM_NODES 5 | |
SLURM_JOB_ID 9495449 ; SLURM_JOB_NAME distributed_tensorpack_mkl.sh ; SLURM_JOB_NODELIST p[1567-1568,1577,1580,1584] ; SLURMD_NODENAME p1580 ; SLURM_JOB_NUM_NODES 5 | |
mkdir: cannot create directory ‘/net/archive/groups/plggluna/adam/experiments/neptune_job_name_1517240471.8’: File exists | |
SLURM_JOB_ID 9495449 ; SLURM_JOB_NAME distributed_tensorpack_mkl.sh ; SLURM_JOB_NODELIST p[1567-1568,1577,1580,1584] ; SLURMD_NODENAME p1577 ; SLURM_JOB_NUM_NODES 5 | |
SLURM_JOB_ID 9495449 ; SLURM_JOB_NAME distributed_tensorpack_mkl.sh ; SLURM_JOB_NODELIST p[1567-1568,1577,1580,1584] ; SLURMD_NODENAME p1567 ; SLURM_JOB_NUM_NODES 5 | |
SLURM_JOB_ID 9495449 ; SLURM_JOB_NAME distributed_tensorpack_mkl.sh ; SLURM_JOB_NODELIST p[1567-1568,1577,1580,1584] ; SLURMD_NODENAME p1568 ; SLURM_JOB_NUM_NODES 5 | |
mkdir: cannot create directory ‘/net/archive/groups/plggluna/adam/experiments/neptune_job_name_1517240471.8/models/’: File exists | |
mkdir: cannot create directory ‘/net/archive/groups/plggluna/adam/experiments/neptune_job_name_1517240471.8/storage/’: File exists | |
plgrid/libs/qt/5.4.1 loaded. | |
plgrid/libs/qt/5.4.1 loaded. | |
plgrid/libs/qt/5.4.1 loaded. | |
plgrid/libs/qt/5.4.1 loaded. | |
plgrid/libs/mkl/11.3.1 loaded. | |
plgrid/libs/mkl/11.3.1 loaded. | |
plgrid/libs/mkl/11.3.1 loaded. | |
plgrid/libs/mkl/11.3.1 loaded. | |
plgrid/libs/qt/5.4.1 loaded. | |
plgrid/libs/mkl/11.3.1 loaded. | |
plgrid/tools/gcc/4.9.2 loaded. | |
plgrid/tools/intel/15.0.2 loaded. | |
plgrid/tools/gcc/4.9.2 loaded. | |
plgrid/tools/intel/15.0.2 loaded. | |
plgrid/tools/gcc/4.9.2 loaded. | |
plgrid/tools/intel/15.0.2 loaded. | |
plgrid/tools/gcc/4.9.2 loaded. | |
plgrid/tools/intel/15.0.2 loaded. | |
plgrid/tools/tcltk/8.5.19-threads loaded. | |
plgrid/tools/python/2.7.13 loaded. | |
plgrid/tools/tcltk/8.5.19-threads loaded. | |
plgrid/tools/python/2.7.13 loaded. | |
plgrid/tools/tcltk/8.5.19-threads loaded. | |
plgrid/tools/python/2.7.13 loaded. | |
plgrid/tools/tcltk/8.5.19-threads loaded. | |
plgrid/tools/python/2.7.13 loaded. | |
plgrid/tools/gcc/4.9.2 loaded. | |
plgrid/tools/intel/15.0.2 loaded. | |
plgrid/tools/tcltk/8.5.19-threads loaded. | |
plgrid/tools/python/2.7.13 loaded. | |
plgrid/libs/mkl/11.3.1 unloaded. | |
plgrid/libs/mkl/2017.0.0 loaded. | |
The following have been reloaded with a version change: | |
1) plgrid/libs/mkl/11.3.1 => plgrid/libs/mkl/2017.0.0 | |
plgrid/libs/mkl/11.3.1 unloaded. | |
plgrid/libs/mkl/2017.0.0 loaded. | |
The following have been reloaded with a version change: | |
1) plgrid/libs/mkl/11.3.1 => plgrid/libs/mkl/2017.0.0 | |
plgrid/libs/mkl/11.3.1 unloaded. | |
plgrid/libs/mkl/11.3.1 unloaded. | |
tools/gcc/6.2.0 loaded. | |
plgrid/libs/mkl/2017.0.0 loaded. | |
The following have been reloaded with a version change: | |
1) plgrid/libs/mkl/11.3.1 => plgrid/libs/mkl/2017.0.0 | |
plgrid/libs/mkl/2017.0.0 loaded. | |
The following have been reloaded with a version change: | |
1) plgrid/libs/mkl/11.3.1 => plgrid/libs/mkl/2017.0.0 | |
plgrid/libs/mkl/11.3.1 unloaded. | |
plgrid/libs/mkl/2017.0.0 loaded. | |
The following have been reloaded with a version change: | |
1) plgrid/libs/mkl/11.3.1 => plgrid/libs/mkl/2017.0.0 | |
tools/gcc/6.2.0 loaded. | |
tools/gcc/6.2.0 loaded. | |
tools/gcc/6.2.0 loaded. | |
tools/gcc/6.2.0 loaded. | |
PROGRAM_ARGS: --mkl 0 --dummy 0 --sync 0 --cpu 1 --artificial_slowdown 0 --queue_size 1 --my_sim_master_queue 1 --train_log_path /net/archive/groups/plggluna/adam/experiments/neptune_job_name_1517240471.8/storage//atari_trainlog/ --predict_batch_size 16 --dummy_predictor 0 --do_train 1 --simulator_procs 100 --env Breakout-v0 --nr_towers 1 --nr_predict_towers 3 --steps_per_epoch 1000 --fc_neurons 256 --batch_size 128 --learning_rate 0.00015 --port 17351 --tf_port 9236 --optimizer adam --use_sync_opt 1 --num_grad 60 --early_stopping None --ps 1 --fc_init uniform --conv_init normal --replace_with_conv True --fc_splits 1 --debug_charts False --epsilon 1e-08 --beta1 0.9 --beta2 0.999 --save_every 0 --models_dir /net/archive/groups/plggluna/adam/experiments/neptune_job_name_1517240471.8/models/ --experiment_dir /net/archive/groups/plggluna/adam/experiments/neptune_job_name_1517240471.8 --adam_debug False --eval_node False --record_node False --schedule_hyper False | |
OFFLINE: False | |
PROGRAM_ARGS: --mkl 0 --dummy 0 --sync 0 --cpu 1 --artificial_slowdown 0 --queue_size 1 --my_sim_master_queue 1 --train_log_path /net/archive/groups/plggluna/adam/experiments/neptune_job_name_1517240471.8/storage//atari_trainlog/ --predict_batch_size 16 --dummy_predictor 0 --do_train 1 --simulator_procs 100 --env Breakout-v0 --nr_towers 1 --nr_predict_towers 3 --steps_per_epoch 1000 --fc_neurons 256 --batch_size 128 --learning_rate 0.00015 --port 17351 --tf_port 9236 --optimizer adam --use_sync_opt 1 --num_grad 60 --early_stopping None --ps 1 --fc_init uniform --conv_init normal --replace_with_conv True --fc_splits 1 --debug_charts False --epsilon 1e-08 --beta1 0.9 --beta2 0.999 --save_every 0 --models_dir /net/archive/groups/plggluna/adam/experiments/neptune_job_name_1517240471.8/models/ --experiment_dir /net/archive/groups/plggluna/adam/experiments/neptune_job_name_1517240471.8 --adam_debug False --eval_node False --record_node False --schedule_hyper False | |
OFFLINE: False | |
PROGRAM_ARGS: --mkl 0 --dummy 0 --sync 0 --cpu 1 --artificial_slowdown 0 --queue_size 1 --my_sim_master_queue 1 --train_log_path /net/archive/groups/plggluna/adam/experiments/neptune_job_name_1517240471.8/storage//atari_trainlog/ --predict_batch_size 16 --dummy_predictor 0 --do_train 1 --simulator_procs 100 --env Breakout-v0 --nr_towers 1 --nr_predict_towers 3 --steps_per_epoch 1000 --fc_neurons 256 --batch_size 128 --learning_rate 0.00015 --port 17351 --tf_port 9236 --optimizer adam --use_sync_opt 1 --num_grad 60 --early_stopping None --ps 1 --fc_init uniform --conv_init normal --replace_with_conv True --fc_splits 1 --debug_charts False --epsilon 1e-08 --beta1 0.9 --beta2 0.999 --save_every 0 --models_dir /net/archive/groups/plggluna/adam/experiments/neptune_job_name_1517240471.8/models/ --experiment_dir /net/archive/groups/plggluna/adam/experiments/neptune_job_name_1517240471.8 --adam_debug False --eval_node False --record_node False --schedule_hyper False | |
OFFLINE: False | |
PROGRAM_ARGS: --mkl 0 --dummy 0 --sync 0 --cpu 1 --artificial_slowdown 0 --queue_size 1 --my_sim_master_queue 1 --train_log_path /net/archive/groups/plggluna/adam/experiments/neptune_job_name_1517240471.8/storage//atari_trainlog/ --predict_batch_size 16 --dummy_predictor 0 --do_train 1 --simulator_procs 100 --env Breakout-v0 --nr_towers 1 --nr_predict_towers 3 --steps_per_epoch 1000 --fc_neurons 256 --batch_size 128 --learning_rate 0.00015 --port 17351 --tf_port 9236 --optimizer adam --use_sync_opt 1 --num_grad 60 --early_stopping None --ps 1 --fc_init uniform --conv_init normal --replace_with_conv True --fc_splits 1 --debug_charts False --epsilon 1e-08 --beta1 0.9 --beta2 0.999 --save_every 0 --models_dir /net/archive/groups/plggluna/adam/experiments/neptune_job_name_1517240471.8/models/ --experiment_dir /net/archive/groups/plggluna/adam/experiments/neptune_job_name_1517240471.8 --adam_debug False --eval_node False --record_node False --schedule_hyper False | |
OFFLINE: False | |
PROGRAM_ARGS: --mkl 0 --dummy 0 --sync 0 --cpu 1 --artificial_slowdown 0 --queue_size 1 --my_sim_master_queue 1 --train_log_path /net/archive/groups/plggluna/adam/experiments/neptune_job_name_1517240471.8/storage//atari_trainlog/ --predict_batch_size 16 --dummy_predictor 0 --do_train 1 --simulator_procs 100 --env Breakout-v0 --nr_towers 1 --nr_predict_towers 3 --steps_per_epoch 1000 --fc_neurons 256 --batch_size 128 --learning_rate 0.00015 --port 17351 --tf_port 9236 --optimizer adam --use_sync_opt 1 --num_grad 60 --early_stopping None --ps 1 --fc_init uniform --conv_init normal --replace_with_conv True --fc_splits 1 --debug_charts False --epsilon 1e-08 --beta1 0.9 --beta2 0.999 --save_every 0 --models_dir /net/archive/groups/plggluna/adam/experiments/neptune_job_name_1517240471.8/models/ --experiment_dir /net/archive/groups/plggluna/adam/experiments/neptune_job_name_1517240471.8 --adam_debug False --eval_node False --record_node False --schedule_hyper False | |
OFFLINE: False | |
2018-01-29 16:41:21.632560: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. | |
2018-01-29 16:41:21.632599: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. | |
2018-01-29 16:41:21.632619: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. | |
2018-01-29 16:41:21.632627: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. | |
2018-01-29 16:41:21.632634: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. | |
2018-01-29 16:41:21.643060: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job ps -> {0 -> localhost:9236} | |
2018-01-29 16:41:21.643096: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job worker -> {0 -> p1568:9237, 1 -> p1577:9237} | |
2018-01-29 16:41:21.644721: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:316] Started server with target: grpc://localhost:9236 | |
{'ps': ['p1567:9236'], 'worker': ['p1568:9237', 'p1577:9237']} | |
[ps:0] Starting the TF server | |
cluster_spec.as_dict(): {'ps': ['p1567:9236'], 'worker': ['p1568:9237', 'p1577:9237']} tf.__version__: 1.2.1 | |
[0129 16:41:21 @train.py:85] [ps:0] joining the server. | |
2018-01-29 16:41:21.932919: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. | |
2018-01-29 16:41:21.932971: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. | |
2018-01-29 16:41:21.932991: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. | |
2018-01-29 16:41:21.932998: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. | |
2018-01-29 16:41:21.933005: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. | |
2018-01-29 16:41:21.943589: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job ps -> {0 -> p1567:9236} | |
2018-01-29 16:41:21.943628: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job worker -> {0 -> localhost:9237, 1 -> p1577:9237} | |
2018-01-29 16:41:21.945305: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:316] Started server with target: grpc://localhost:9237 | |
[2018-01-29 16:41:21,946] Making new env: Breakout-v0 | |
{'ps': ['p1567:9236'], 'worker': ['p1568:9237', 'p1577:9237']} | |
[worker:0] Starting the TF server | |
cluster_spec.as_dict(): {'ps': ['p1567:9236'], 'worker': ['p1568:9237', 'p1577:9237']} tf.__version__: 1.2.1 | |
args.mkl == 0 | |
using tensorflow convolution | |
[2018-01-29 16:41:22,056] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,062] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,068] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,074] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,080] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,086] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,092] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,098] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,105] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,111] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,118] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,125] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,132] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,139] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,146] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,153] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,160] Making new env: Breakout-v0 | |
2018-01-29 16:41:22.164770: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. | |
2018-01-29 16:41:22.164826: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. | |
2018-01-29 16:41:22.164848: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. | |
2018-01-29 16:41:22.164855: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. | |
2018-01-29 16:41:22.164863: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. | |
[2018-01-29 16:41:22,167] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,174] Making new env: Breakout-v0 | |
2018-01-29 16:41:22.175896: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job ps -> {0 -> p1567:9236} | |
2018-01-29 16:41:22.175933: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job worker -> {0 -> p1568:9237, 1 -> localhost:9237} | |
2018-01-29 16:41:22.177694: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:316] Started server with target: grpc://localhost:9237 | |
[2018-01-29 16:41:22,179] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,181] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,188] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,195] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,202] Making new env: Breakout-v0 | |
{'ps': ['p1567:9236'], 'worker': ['p1568:9237', 'p1577:9237']} | |
[worker:1] Starting the TF server | |
cluster_spec.as_dict(): {'ps': ['p1567:9236'], 'worker': ['p1568:9237', 'p1577:9237']} tf.__version__: 1.2.1 | |
args.mkl == 0 | |
[2018-01-29 16:41:22,208] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,216] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,223] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,230] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,238] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,245] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,252] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,259] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,267] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,274] Making new env: Breakout-v0 | |
using tensorflow convolution | |
[2018-01-29 16:41:22,282] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,288] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,290] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,295] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,296] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,302] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,303] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,308] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,310] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,313] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,317] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,320] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,325] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,326] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,332] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,332] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,339] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,340] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,345] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,347] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,352] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,355] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,359] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,362] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,365] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,369] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,372] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,376] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,379] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,384] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,386] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,392] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,393] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,399] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,400] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,406] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,407] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,413] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,414] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,420] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,422] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,427] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,429] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,434] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,437] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,441] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,444] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,448] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,452] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,455] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,460] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,463] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,467] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,470] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,475] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,476] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,482] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,483] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,490] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,490] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,497] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,498] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,505] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,505] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,512] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,514] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,519] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,521] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,526] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,529] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,534] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,537] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,541] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,545] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,549] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,553] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,556] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,561] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,563] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,569] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,570] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,576] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,578] Making new env: Breakout-v0 | |
2018-01-29 16:41:22.580249: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. | |
2018-01-29 16:41:22.580450: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. | |
2018-01-29 16:41:22.580460: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. | |
2018-01-29 16:41:22.580467: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. | |
2018-01-29 16:41:22.580475: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. | |
[2018-01-29 16:41:22,585] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,584] Making new env: Breakout-v0 | |
Traceback (most recent call last): | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 76, in <module> | |
server = tf.train.Server(cluster_spec, job_name=my_job_name, task_index=my_task_index) | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/training/server_lib.py", line 145, in __init__ | |
self._server_def.SerializeToString(), status) | |
File "/net/software/local/python/2.7.9/lib/python2.7/contextlib.py", line 24, in __exit__ | |
self.gen.next() | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status | |
[2018-01-29 16:41:22,592] Making new env: Breakout-v0 | |
pywrap_tensorflow.TF_GetCode(status)) | |
InvalidArgumentError: Task 2 was not defined in job "worker" | |
[2018-01-29 16:41:22,592] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,599] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,600] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,606] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,608] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,614] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,617] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,622] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,625] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,629] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,633] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,637] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,641] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,644] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,649] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,652] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,657] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,660] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,664] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,667] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,673] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,675] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,680] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,682] Making new env: Breakout-v0 | |
2018-01-29 16:41:22.688172: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. | |
2018-01-29 16:41:22.688407: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. | |
2018-01-29 16:41:22.688417: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. | |
2018-01-29 16:41:22.688425: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. | |
[2018-01-29 16:41:22,688] Making new env: Breakout-v0 | |
2018-01-29 16:41:22.688432: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. | |
[2018-01-29 16:41:22,690] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,695] Making new env: Breakout-v0 | |
Traceback (most recent call last): | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 76, in <module> | |
[2018-01-29 16:41:22,697] Making new env: Breakout-v0 | |
server = tf.train.Server(cluster_spec, job_name=my_job_name, task_index=my_task_index) | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/training/server_lib.py", line 145, in __init__ | |
self._server_def.SerializeToString(), status) | |
File "/net/software/local/python/2.7.9/lib/python2.7/contextlib.py", line 24, in __exit__ | |
self.gen.next() | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status | |
pywrap_tensorflow.TF_GetCode(status)) | |
InvalidArgumentError: Task 3 was not defined in job "worker" | |
[2018-01-29 16:41:22,704] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,704] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,712] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,712] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,719] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,721] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,727] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,728] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,734] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,737] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,742] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,745] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,750] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,753] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,758] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,761] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,766] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,770] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,774] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,778] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,782] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,786] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,790] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,794] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,797] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,805] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,813] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,820] Making new env: Breakout-v0 | |
None <type 'NoneType'> | |
worker host: grpc://localhost:9237 | |
[2018-01-29 16:41:22,828] Making new env: Breakout-v0 | |
[0129 16:41:22 @train.py:718] [BA3C] Train on gpu 0 and infer on gpu 0,0,0 | |
[0129 16:41:22 @train.py:724] using async version | |
[2018-01-29 16:41:22,835] Making new env: Breakout-v0 | |
DUMMY PREDICTOR 0 | |
[2018-01-29 16:41:22,843] Making new env: Breakout-v0 | |
MultiGPUTrainer __init__ dummy = 0 | |
[0129 16:41:22 @multigpu.py:57] Training a model of 1 tower | |
[2018-01-29 16:41:22,851] Making new env: Breakout-v0 | |
[0129 16:41:22 @multigpu.py:67] Building graph for training tower 0..., /cpu:0 | |
===== [p1568] PRINTING BUILD GRAPH STACK AT 1517240482.85============== File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 728, in <module> | |
trainer.train() | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/train/multigpu.py", line 137, in train | |
grad_list = self._multi_tower_grads() | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/train/multigpu.py", line 81, in _multi_tower_grads | |
self.model.build_graph(model_inputs) | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/models/model_desc.py", line 140, in build_graph | |
self._build_graph(model_inputs) | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 279, in _build_graph | |
traceback.print_stack(file=sys.stderr) | |
[2018-01-29 16:41:22,859] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,867] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,875] Making new env: Breakout-v0 | |
12 | |
[2018-01-29 16:41:22,883] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,890] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,898] Making new env: Breakout-v0 | |
[0129 16:41:22 @_common.py:61] conv0 input: [None, 84, 84, 16] | |
[2018-01-29 16:41:22,906] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,914] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,922] Making new env: Breakout-v0 | |
Tensor("tower0/conv0/output:0", shape=(?, 80, 80, 32), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
[0129 16:41:22 @_common.py:70] conv0 output: [None, 80, 80, 32] | |
[2018-01-29 16:41:22,930] Making new env: Breakout-v0 | |
[0129 16:41:22 @_common.py:61] pool0 input: [None, 80, 80, 32] | |
Tensor("tower0/pool0/MaxPool:0", shape=(?, 40, 40, 32), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
[0129 16:41:22 @_common.py:70] pool0 output: [None, 40, 40, 32] | |
[0129 16:41:22 @_common.py:61] conv1 input: [None, 40, 40, 32] | |
[2018-01-29 16:41:22,938] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,946] Making new env: Breakout-v0 | |
Tensor("tower0/conv1/output:0", shape=(?, 36, 36, 32), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
[0129 16:41:22 @_common.py:70] conv1 output: [None, 36, 36, 32] | |
[0129 16:41:22 @_common.py:61] pool1 input: [None, 36, 36, 32] | |
Tensor("tower0/pool1/MaxPool:0", shape=(?, 18, 18, 32), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
[0129 16:41:22 @_common.py:70] pool1 output: [None, 18, 18, 32] | |
[0129 16:41:22 @_common.py:61] conv2 input: [None, 18, 18, 32] | |
[2018-01-29 16:41:22,953] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,962] Making new env: Breakout-v0 | |
Tensor("tower0/conv2/output:0", shape=(?, 14, 14, 64), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
[0129 16:41:22 @_common.py:70] conv2 output: [None, 14, 14, 64] | |
[0129 16:41:22 @_common.py:61] pool2 input: [None, 14, 14, 64] | |
Tensor("tower0/pool2/MaxPool:0", shape=(?, 7, 7, 64), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
[0129 16:41:22 @_common.py:70] pool2 output: [None, 7, 7, 64] | |
[0129 16:41:22 @_common.py:61] conv3 input: [None, 7, 7, 64] | |
[2018-01-29 16:41:22,971] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,981] Making new env: Breakout-v0 | |
Tensor("tower0/conv3/output:0", shape=(?, 5, 5, 64), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
[0129 16:41:22 @_common.py:70] conv3 output: [None, 5, 5, 64] | |
[0129 16:41:22 @_common.py:61] fc1_0 input: [None, 5, 5, 64] | |
[2018-01-29 16:41:22,989] Making new env: Breakout-v0 | |
[2018-01-29 16:41:22,997] Making new env: Breakout-v0 | |
Tensor("tower0/fc1_0/output:0", shape=(?, 1, 1, 256), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
[0129 16:41:22 @_common.py:70] fc1_0 output: [None, 1, 1, 256] | |
[2018-01-29 16:41:23,005] Making new env: Breakout-v0 | |
[0129 16:41:23 @_common.py:61] fc-pi input: [None, 256] | |
[2018-01-29 16:41:23,013] Making new env: Breakout-v0 | |
[2018-01-29 16:41:23,021] Making new env: Breakout-v0 | |
Tensor("tower0/fc-pi/output:0", shape=(?, 6), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
[0129 16:41:23 @_common.py:70] fc-pi output: [None, 6] | |
[0129 16:41:23 @_common.py:61] fc-v input: [None, 256] | |
None <type 'NoneType'> | |
worker host: grpc://localhost:9237 | |
[0129 16:41:23 @train.py:718] [BA3C] Train on gpu 0 and infer on gpu 0,0,0 | |
[0129 16:41:23 @train.py:724] using async version | |
DUMMY PREDICTOR 0 | |
MultiGPUTrainer __init__ dummy = 0 | |
[0129 16:41:23 @multigpu.py:57] Training a model of 1 tower | |
[0129 16:41:23 @multigpu.py:67] Building graph for training tower 0..., /cpu:0 | |
===== [p1577] PRINTING BUILD GRAPH STACK AT 1517240483.08============== File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 728, in <module> | |
trainer.train() | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/train/multigpu.py", line 137, in train | |
grad_list = self._multi_tower_grads() | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/train/multigpu.py", line 81, in _multi_tower_grads | |
self.model.build_graph(model_inputs) | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/models/model_desc.py", line 140, in build_graph | |
self._build_graph(model_inputs) | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 279, in _build_graph | |
traceback.print_stack(file=sys.stderr) | |
12 | |
[0129 16:41:23 @_common.py:61] conv0 input: [None, 84, 84, 16] | |
Tensor("tower0/conv0/output:0", shape=(?, 80, 80, 32), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
[0129 16:41:23 @_common.py:70] conv0 output: [None, 80, 80, 32] | |
[0129 16:41:23 @_common.py:61] pool0 input: [None, 80, 80, 32] | |
Tensor("tower0/pool0/MaxPool:0", shape=(?, 40, 40, 32), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
[0129 16:41:23 @_common.py:70] pool0 output: [None, 40, 40, 32] | |
[0129 16:41:23 @_common.py:61] conv1 input: [None, 40, 40, 32] | |
Tensor("tower0/conv1/output:0", shape=(?, 36, 36, 32), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
[0129 16:41:23 @_common.py:70] conv1 output: [None, 36, 36, 32] | |
[0129 16:41:23 @_common.py:61] pool1 input: [None, 36, 36, 32] | |
Tensor("tower0/pool1/MaxPool:0", shape=(?, 18, 18, 32), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
[0129 16:41:23 @_common.py:70] pool1 output: [None, 18, 18, 32] | |
[0129 16:41:23 @_common.py:61] conv2 input: [None, 18, 18, 32] | |
Tensor("tower0/conv2/output:0", shape=(?, 14, 14, 64), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
[0129 16:41:23 @_common.py:70] conv2 output: [None, 14, 14, 64] | |
[0129 16:41:23 @_common.py:61] pool2 input: [None, 14, 14, 64] | |
Tensor("tower0/pool2/MaxPool:0", shape=(?, 7, 7, 64), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
[0129 16:41:23 @_common.py:70] pool2 output: [None, 7, 7, 64] | |
[0129 16:41:23 @_common.py:61] conv3 input: [None, 7, 7, 64] | |
Tensor("tower0/conv3/output:0", shape=(?, 5, 5, 64), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
[0129 16:41:23 @_common.py:70] conv3 output: [None, 5, 5, 64] | |
[0129 16:41:23 @_common.py:61] fc1_0 input: [None, 5, 5, 64] | |
Tensor("tower0/fc1_0/output:0", shape=(?, 1, 1, 256), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
[0129 16:41:23 @_common.py:70] fc1_0 output: [None, 1, 1, 256] | |
[0129 16:41:23 @_common.py:61] fc-pi input: [None, 256] | |
Tensor("tower0/fc-pi/output:0", shape=(?, 6), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
[0129 16:41:23 @_common.py:70] fc-pi output: [None, 6] | |
[0129 16:41:23 @_common.py:61] fc-v input: [None, 256] | |
Traceback (most recent call last): | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 76, in <module> | |
server = tf.train.Server(cluster_spec, job_name=my_job_name, task_index=my_task_index) | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/training/server_lib.py", line 145, in __init__ | |
self._server_def.SerializeToString(), status) | |
File "/net/software/local/python/2.7.9/lib/python2.7/contextlib.py", line 24, in __exit__ | |
self.gen.next() | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status | |
pywrap_tensorflow.TF_GetCode(status)) | |
InvalidArgumentError: Task 2 was not defined in job "worker" | |
Traceback (most recent call last): | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 76, in <module> | |
server = tf.train.Server(cluster_spec, job_name=my_job_name, task_index=my_task_index) | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/training/server_lib.py", line 145, in __init__ | |
self._server_def.SerializeToString(), status) | |
File "/net/software/local/python/2.7.9/lib/python2.7/contextlib.py", line 24, in __exit__ | |
self.gen.next() | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status | |
pywrap_tensorflow.TF_GetCode(status)) | |
InvalidArgumentError: Task 3 was not defined in job "worker" | |
Tensor("tower0/fc-v/output:0", shape=(?, 1), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
[0129 16:41:23 @_common.py:70] fc-v output: [None, 1] | |
Tensor("tower0/fc-v/output:0", shape=(?, 1), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
[0129 16:41:24 @_common.py:70] fc-v output: [None, 1] | |
Traceback (most recent call last): | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 76, in <module> | |
server = tf.train.Server(cluster_spec, job_name=my_job_name, task_index=my_task_index) | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/training/server_lib.py", line 145, in __init__ | |
self._server_def.SerializeToString(), status) | |
File "/net/software/local/python/2.7.9/lib/python2.7/contextlib.py", line 24, in __exit__ | |
self.gen.next() | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status | |
pywrap_tensorflow.TF_GetCode(status)) | |
InvalidArgumentError: Task 2 was not defined in job "worker" | |
MOVING_SUMMARY_VARIABLES | |
[] | |
Traceback (most recent call last): | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 76, in <module> | |
server = tf.train.Server(cluster_spec, job_name=my_job_name, task_index=my_task_index) | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/training/server_lib.py", line 145, in __init__ | |
self._server_def.SerializeToString(), status) | |
File "/net/software/local/python/2.7.9/lib/python2.7/contextlib.py", line 24, in __exit__ | |
self.gen.next() | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status | |
pywrap_tensorflow.TF_GetCode(status)) | |
InvalidArgumentError: Task 3 was not defined in job "worker" | |
[0129 16:41:24 @modelutils.py:22] Model Parameters: | |
conv0/W:0: shape=[5, 5, 16, 32], dim=12800 | |
conv1/W:0: shape=[5, 5, 32, 32], dim=25600 | |
conv2/W:0: shape=[5, 5, 32, 64], dim=51200 | |
conv3/W:0: shape=[3, 3, 64, 64], dim=36864 | |
fc1_0/W:0: shape=[5, 5, 64, 256], dim=409600 | |
fc-pi/W:0: shape=[256, 6], dim=1536 | |
fc-pi/b:0: shape=[6], dim=6 | |
fc-v/W:0: shape=[256, 1], dim=256 | |
fc-v/b:0: shape=[1], dim=1 | |
Total param=537863 (2.051785 MB assuming all float32) | |
MOVING_SUMMARY_VARIABLES | |
[] | |
[0129 16:41:24 @modelutils.py:22] Model Parameters: | |
conv0/W:0: shape=[5, 5, 16, 32], dim=12800 | |
conv1/W:0: shape=[5, 5, 32, 32], dim=25600 | |
conv2/W:0: shape=[5, 5, 32, 64], dim=51200 | |
conv3/W:0: shape=[3, 3, 64, 64], dim=36864 | |
fc1_0/W:0: shape=[5, 5, 64, 256], dim=409600 | |
fc-pi/W:0: shape=[256, 6], dim=1536 | |
fc-pi/b:0: shape=[6], dim=6 | |
fc-v/W:0: shape=[256, 1], dim=256 | |
fc-v/b:0: shape=[1], dim=1 | |
Total param=537863 (2.051785 MB assuming all float32) | |
Traceback (most recent call last): | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 76, in <module> | |
server = tf.train.Server(cluster_spec, job_name=my_job_name, task_index=my_task_index) | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/training/server_lib.py", line 145, in __init__ | |
self._server_def.SerializeToString(), status) | |
File "/net/software/local/python/2.7.9/lib/python2.7/contextlib.py", line 24, in __exit__ | |
self.gen.next() | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status | |
pywrap_tensorflow.TF_GetCode(status)) | |
InvalidArgumentError: Task 2 was not defined in job "worker" | |
Traceback (most recent call last): | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 76, in <module> | |
server = tf.train.Server(cluster_spec, job_name=my_job_name, task_index=my_task_index) | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/training/server_lib.py", line 145, in __init__ | |
self._server_def.SerializeToString(), status) | |
File "/net/software/local/python/2.7.9/lib/python2.7/contextlib.py", line 24, in __exit__ | |
self.gen.next() | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status | |
pywrap_tensorflow.TF_GetCode(status)) | |
InvalidArgumentError: Task 3 was not defined in job "worker" | |
[0129 16:41:26 @multigpu.py:228] Setup callbacks ... | |
Creating Predictorfactor 0 | |
[0129 16:41:26 @base.py:132] Building graph for predictor tower 0... | |
===== [p1568] PRINTING BUILD GRAPH STACK AT 1517240486.25============== File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 728, in <module> | |
trainer.train() | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/train/multigpu.py", line 230, in train | |
callbacks.setup_graph(self) # TODO use weakref instead? | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/callbacks/base.py", line 40, in setup_graph | |
self._setup_graph() | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/callbacks/group.py", line 66, in _setup_graph | |
cb.setup_graph(self.trainer) | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/callbacks/base.py", line 40, in setup_graph | |
self._setup_graph() | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 367, in _setup_graph | |
self.trainer.get_predict_funcs(['state'], ['logitsT', 'pred_value'], self.predictor_threads), | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/train/trainer.py", line 331, in get_predict_funcs | |
return [self.get_predict_func(input_names, output_names, k) for k in range(n)] | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/train/trainer.py", line 328, in get_predict_func | |
return self.predictor_factory.get_predictor(input_names, output_names, tower) | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/train/trainer.py", line 54, in get_predictor | |
self._build_predict_tower() | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/train/trainer.py", line 71, in _build_predict_tower | |
self.model, self.towers) | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/predict/base.py", line 134, in build_multi_tower_prediction_graph | |
model.build_graph(input_vars) | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/models/model_desc.py", line 140, in build_graph | |
self._build_graph(model_inputs) | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 279, in _build_graph | |
traceback.print_stack(file=sys.stderr) | |
12 | |
Tensor("towerp0/conv0/output:0", shape=(?, 80, 80, 32), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
Tensor("towerp0/pool0/MaxPool:0", shape=(?, 40, 40, 32), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
Tensor("towerp0/conv1/output:0", shape=(?, 36, 36, 32), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
Tensor("towerp0/pool1/MaxPool:0", shape=(?, 18, 18, 32), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
Tensor("towerp0/conv2/output:0", shape=(?, 14, 14, 64), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
Tensor("towerp0/pool2/MaxPool:0", shape=(?, 7, 7, 64), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
Tensor("towerp0/conv3/output:0", shape=(?, 5, 5, 64), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
Tensor("towerp0/fc1_0/output:0", shape=(?, 1, 1, 256), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
Tensor("towerp0/fc-pi/output:0", shape=(?, 6), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
Tensor("towerp0/fc-v/output:0", shape=(?, 1), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
[0129 16:41:26 @base.py:132] Building graph for predictor tower 0... | |
===== [p1568] PRINTING BUILD GRAPH STACK AT 1517240486.38============== File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 728, in <module> | |
trainer.train() | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/train/multigpu.py", line 230, in train | |
callbacks.setup_graph(self) # TODO use weakref instead? | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/callbacks/base.py", line 40, in setup_graph | |
self._setup_graph() | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/callbacks/group.py", line 66, in _setup_graph | |
cb.setup_graph(self.trainer) | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/callbacks/base.py", line 40, in setup_graph | |
self._setup_graph() | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 367, in _setup_graph | |
self.trainer.get_predict_funcs(['state'], ['logitsT', 'pred_value'], self.predictor_threads), | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/train/trainer.py", line 331, in get_predict_funcs | |
return [self.get_predict_func(input_names, output_names, k) for k in range(n)] | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/train/trainer.py", line 328, in get_predict_func | |
return self.predictor_factory.get_predictor(input_names, output_names, tower) | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/train/trainer.py", line 54, in get_predictor | |
self._build_predict_tower() | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/train/trainer.py", line 71, in _build_predict_tower | |
self.model, self.towers) | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/predict/base.py", line 134, in build_multi_tower_prediction_graph | |
model.build_graph(input_vars) | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/models/model_desc.py", line 140, in build_graph | |
self._build_graph(model_inputs) | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 279, in _build_graph | |
traceback.print_stack(file=sys.stderr) | |
12 | |
Tensor("towerp0_1/conv0/output:0", shape=(?, 80, 80, 32), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
Tensor("towerp0_1/pool0/MaxPool:0", shape=(?, 40, 40, 32), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
Tensor("towerp0_1/conv1/output:0", shape=(?, 36, 36, 32), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
Tensor("towerp0_1/pool1/MaxPool:0", shape=(?, 18, 18, 32), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
Tensor("towerp0_1/conv2/output:0", shape=(?, 14, 14, 64), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
Tensor("towerp0_1/pool2/MaxPool:0", shape=(?, 7, 7, 64), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
Tensor("towerp0_1/conv3/output:0", shape=(?, 5, 5, 64), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
Tensor("towerp0_1/fc1_0/output:0", shape=(?, 1, 1, 256), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
Tensor("towerp0_1/fc-pi/output:0", shape=(?, 6), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
Tensor("towerp0_1/fc-v/output:0", shape=(?, 1), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
[0129 16:41:26 @base.py:132] Building graph for predictor tower 0... | |
===== [p1568] PRINTING BUILD GRAPH STACK AT 1517240486.5============== File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 728, in <module> | |
trainer.train() | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/train/multigpu.py", line 230, in train | |
callbacks.setup_graph(self) # TODO use weakref instead? | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/callbacks/base.py", line 40, in setup_graph | |
self._setup_graph() | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/callbacks/group.py", line 66, in _setup_graph | |
cb.setup_graph(self.trainer) | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/callbacks/base.py", line 40, in setup_graph | |
self._setup_graph() | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 367, in _setup_graph | |
self.trainer.get_predict_funcs(['state'], ['logitsT', 'pred_value'], self.predictor_threads), | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/train/trainer.py", line 331, in get_predict_funcs | |
return [self.get_predict_func(input_names, output_names, k) for k in range(n)] | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/train/trainer.py", line 328, in get_predict_func | |
return self.predictor_factory.get_predictor(input_names, output_names, tower) | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/train/trainer.py", line 54, in get_predictor | |
self._build_predict_tower() | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/train/trainer.py", line 71, in _build_predict_tower | |
self.model, self.towers) | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/predict/base.py", line 134, in build_multi_tower_prediction_graph | |
model.build_graph(input_vars) | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/models/model_desc.py", line 140, in build_graph | |
self._build_graph(model_inputs) | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 279, in _build_graph | |
traceback.print_stack(file=sys.stderr) | |
[0129 16:41:26 @multigpu.py:228] Setup callbacks ... | |
Creating Predictorfactor 0 | |
[0129 16:41:26 @base.py:132] Building graph for predictor tower 0... | |
12 | |
===== [p1577] PRINTING BUILD GRAPH STACK AT 1517240486.51============== File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 728, in <module> | |
trainer.train() | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/train/multigpu.py", line 230, in train | |
callbacks.setup_graph(self) # TODO use weakref instead? | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/callbacks/base.py", line 40, in setup_graph | |
self._setup_graph() | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/callbacks/group.py", line 66, in _setup_graph | |
cb.setup_graph(self.trainer) | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/callbacks/base.py", line 40, in setup_graph | |
self._setup_graph() | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 367, in _setup_graph | |
self.trainer.get_predict_funcs(['state'], ['logitsT', 'pred_value'], self.predictor_threads), | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/train/trainer.py", line 331, in get_predict_funcs | |
return [self.get_predict_func(input_names, output_names, k) for k in range(n)] | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/train/trainer.py", line 328, in get_predict_func | |
return self.predictor_factory.get_predictor(input_names, output_names, tower) | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/train/trainer.py", line 54, in get_predictor | |
self._build_predict_tower() | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/train/trainer.py", line 71, in _build_predict_tower | |
self.model, self.towers) | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/predict/base.py", line 134, in build_multi_tower_prediction_graph | |
model.build_graph(input_vars) | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/models/model_desc.py", line 140, in build_graph | |
self._build_graph(model_inputs) | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 279, in _build_graph | |
traceback.print_stack(file=sys.stderr) | |
Tensor("towerp0_2/conv0/output:0", shape=(?, 80, 80, 32), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
12 | |
Tensor("towerp0_2/pool0/MaxPool:0", shape=(?, 40, 40, 32), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
Tensor("towerp0_2/conv1/output:0", shape=(?, 36, 36, 32), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
Tensor("towerp0_2/pool1/MaxPool:0", shape=(?, 18, 18, 32), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
Tensor("towerp0_2/conv2/output:0", shape=(?, 14, 14, 64), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
Tensor("towerp0_2/pool2/MaxPool:0", shape=(?, 7, 7, 64), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
Tensor("towerp0_2/conv3/output:0", shape=(?, 5, 5, 64), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
Tensor("towerp0/conv0/output:0", shape=(?, 80, 80, 32), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
Tensor("towerp0_2/fc1_0/output:0", shape=(?, 1, 1, 256), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
Tensor("towerp0/pool0/MaxPool:0", shape=(?, 40, 40, 32), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
Tensor("towerp0/conv1/output:0", shape=(?, 36, 36, 32), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
Tensor("towerp0/pool1/MaxPool:0", shape=(?, 18, 18, 32), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
Tensor("towerp0/conv2/output:0", shape=(?, 14, 14, 64), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
Tensor("towerp0/pool2/MaxPool:0", shape=(?, 7, 7, 64), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
Tensor("towerp0_2/fc-pi/output:0", shape=(?, 6), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
Tensor("towerp0/conv3/output:0", shape=(?, 5, 5, 64), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
Tensor("towerp0/fc1_0/output:0", shape=(?, 1, 1, 256), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
Tensor("towerp0_2/fc-v/output:0", shape=(?, 1), dtype=float32, device=/job:worker/task:0/device:CPU:0) | |
Tensor("towerp0/fc-pi/output:0", shape=(?, 6), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
Tensor("towerp0/fc-v/output:0", shape=(?, 1), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
Traceback (most recent call last): | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 76, in <module> | |
server = tf.train.Server(cluster_spec, job_name=my_job_name, task_index=my_task_index) | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/training/server_lib.py", line 145, in __init__ | |
self._server_def.SerializeToString(), status) | |
File "/net/software/local/python/2.7.9/lib/python2.7/contextlib.py", line 24, in __exit__ | |
self.gen.next() | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status | |
pywrap_tensorflow.TF_GetCode(status)) | |
InvalidArgumentError: Task 2 was not defined in job "worker" | |
[0129 16:41:26 @base.py:177] =============================================================== | |
[0129 16:41:26 @base.py:179] CHIEF! | |
[0129 16:41:26 @base.py:180] [p1568] Creating the session | |
[0129 16:41:26 @base.py:181] =============================================================== | |
[0129 16:41:26 @base.py:132] Building graph for predictor tower 0... | |
===== [p1577] PRINTING BUILD GRAPH STACK AT 1517240486.65============== File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 728, in <module> | |
trainer.train() | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/train/multigpu.py", line 230, in train | |
callbacks.setup_graph(self) # TODO use weakref instead? | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/callbacks/base.py", line 40, in setup_graph | |
self._setup_graph() | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/callbacks/group.py", line 66, in _setup_graph | |
cb.setup_graph(self.trainer) | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/callbacks/base.py", line 40, in setup_graph | |
self._setup_graph() | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 367, in _setup_graph | |
self.trainer.get_predict_funcs(['state'], ['logitsT', 'pred_value'], self.predictor_threads), | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/train/trainer.py", line 331, in get_predict_funcs | |
return [self.get_predict_func(input_names, output_names, k) for k in range(n)] | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/train/trainer.py", line 328, in get_predict_func | |
return self.predictor_factory.get_predictor(input_names, output_names, tower) | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/train/trainer.py", line 54, in get_predictor | |
self._build_predict_tower() | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/train/trainer.py", line 71, in _build_predict_tower | |
self.model, self.towers) | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/predict/base.py", line 134, in build_multi_tower_prediction_graph | |
model.build_graph(input_vars) | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/models/model_desc.py", line 140, in build_graph | |
self._build_graph(model_inputs) | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 279, in _build_graph | |
traceback.print_stack(file=sys.stderr) | |
12 | |
Tensor("towerp0_1/conv0/output:0", shape=(?, 80, 80, 32), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
Tensor("towerp0_1/pool0/MaxPool:0", shape=(?, 40, 40, 32), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
Tensor("towerp0_1/conv1/output:0", shape=(?, 36, 36, 32), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
Tensor("towerp0_1/pool1/MaxPool:0", shape=(?, 18, 18, 32), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
Tensor("towerp0_1/conv2/output:0", shape=(?, 14, 14, 64), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
Tensor("towerp0_1/pool2/MaxPool:0", shape=(?, 7, 7, 64), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
Tensor("towerp0_1/conv3/output:0", shape=(?, 5, 5, 64), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
Tensor("towerp0_1/fc1_0/output:0", shape=(?, 1, 1, 256), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
Tensor("towerp0_1/fc-pi/output:0", shape=(?, 6), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
Tensor("towerp0_1/fc-v/output:0", shape=(?, 1), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
Traceback (most recent call last): | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 76, in <module> | |
server = tf.train.Server(cluster_spec, job_name=my_job_name, task_index=my_task_index) | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/training/server_lib.py", line 145, in __init__ | |
self._server_def.SerializeToString(), status) | |
File "/net/software/local/python/2.7.9/lib/python2.7/contextlib.py", line 24, in __exit__ | |
self.gen.next() | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status | |
pywrap_tensorflow.TF_GetCode(status)) | |
InvalidArgumentError: Task 3 was not defined in job "worker" | |
[0129 16:41:26 @base.py:132] Building graph for predictor tower 0... | |
===== [p1577] PRINTING BUILD GRAPH STACK AT 1517240486.77============== File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 728, in <module> | |
trainer.train() | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/train/multigpu.py", line 230, in train | |
callbacks.setup_graph(self) # TODO use weakref instead? | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/callbacks/base.py", line 40, in setup_graph | |
self._setup_graph() | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/callbacks/group.py", line 66, in _setup_graph | |
cb.setup_graph(self.trainer) | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/callbacks/base.py", line 40, in setup_graph | |
self._setup_graph() | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 367, in _setup_graph | |
self.trainer.get_predict_funcs(['state'], ['logitsT', 'pred_value'], self.predictor_threads), | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/train/trainer.py", line 331, in get_predict_funcs | |
return [self.get_predict_func(input_names, output_names, k) for k in range(n)] | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/train/trainer.py", line 328, in get_predict_func | |
return self.predictor_factory.get_predictor(input_names, output_names, tower) | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/train/trainer.py", line 54, in get_predictor | |
self._build_predict_tower() | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/train/trainer.py", line 71, in _build_predict_tower | |
self.model, self.towers) | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/predict/base.py", line 134, in build_multi_tower_prediction_graph | |
model.build_graph(input_vars) | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/tensorpack_cpu/tensorpack/models/model_desc.py", line 140, in build_graph | |
self._build_graph(model_inputs) | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 279, in _build_graph | |
traceback.print_stack(file=sys.stderr) | |
12 | |
Tensor("towerp0_2/conv0/output:0", shape=(?, 80, 80, 32), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
Tensor("towerp0_2/pool0/MaxPool:0", shape=(?, 40, 40, 32), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
Tensor("towerp0_2/conv1/output:0", shape=(?, 36, 36, 32), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
Tensor("towerp0_2/pool1/MaxPool:0", shape=(?, 18, 18, 32), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
Tensor("towerp0_2/conv2/output:0", shape=(?, 14, 14, 64), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
Tensor("towerp0_2/pool2/MaxPool:0", shape=(?, 7, 7, 64), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
Tensor("towerp0_2/conv3/output:0", shape=(?, 5, 5, 64), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
Tensor("towerp0_2/fc1_0/output:0", shape=(?, 1, 1, 256), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
Tensor("towerp0_2/fc-pi/output:0", shape=(?, 6), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
Tensor("towerp0_2/fc-v/output:0", shape=(?, 1), dtype=float32, device=/job:worker/task:1/device:CPU:0) | |
[0129 16:41:26 @base.py:177] =============================================================== | |
[0129 16:41:26 @base.py:180] [p1577] Creating the session | |
[0129 16:41:26 @base.py:181] =============================================================== | |
2018-01-29 16:41:27.514947: I tensorflow/core/distributed_runtime/master_session.cc:999] Start master session bb8c9e5e6f8d6e2a with config: | |
Traceback (most recent call last): | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 76, in <module> | |
server = tf.train.Server(cluster_spec, job_name=my_job_name, task_index=my_task_index) | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/training/server_lib.py", line 145, in __init__ | |
self._server_def.SerializeToString(), status) | |
File "/net/software/local/python/2.7.9/lib/python2.7/contextlib.py", line 24, in __exit__ | |
self.gen.next() | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status | |
pywrap_tensorflow.TF_GetCode(status)) | |
InvalidArgumentError: Task 2 was not defined in job "worker" | |
[0129 16:41:27 @base.py:189] =============================================================== | |
[0129 16:41:27 @base.py:190] [p1568] Session created | |
[0129 16:41:27 @base.py:191] =============================================================== | |
[0129 16:41:27 @base.py:112] [p1568] Initializing graph variables ... | |
[0129 16:41:27 @base.py:119] [p1568] Starting concurrency... | |
[0129 16:41:27 @base.py:198] Starting all threads & procs ... | |
[0129 16:41:27 @base.py:122] [p1568] Setting default session | |
[0129 16:41:27 @base.py:125] [p1568] Getting global step | |
[0129 16:41:27 @base.py:127] [p1568] Start training with global_step=0 | |
Traceback (most recent call last): | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 76, in <module> | |
server = tf.train.Server(cluster_spec, job_name=my_job_name, task_index=my_task_index) | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/training/server_lib.py", line 145, in __init__ | |
self._server_def.SerializeToString(), status) | |
File "/net/software/local/python/2.7.9/lib/python2.7/contextlib.py", line 24, in __exit__ | |
self.gen.next() | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status | |
pywrap_tensorflow.TF_GetCode(status)) | |
InvalidArgumentError: Task 3 was not defined in job "worker" | |
2018-01-29 16:41:27.738938: I tensorflow/core/distributed_runtime/master_session.cc:999] Start master session 403ed8c384cb098e with config: | |
[0129 16:41:27 @base.py:189] =============================================================== | |
[0129 16:41:27 @base.py:190] [p1577] Session created | |
[0129 16:41:27 @base.py:191] =============================================================== | |
[0129 16:41:27 @base.py:112] [p1577] Initializing graph variables ... | |
[0129 16:41:27 @base.py:119] [p1577] Starting concurrency... | |
[0129 16:41:27 @base.py:198] Starting all threads & procs ... | |
[0129 16:41:27 @base.py:122] [p1577] Setting default session | |
[0129 16:41:27 @base.py:125] [p1577] Getting global step | |
[0129 16:41:27 @base.py:127] [p1577] Start training with global_step=0 | |
server main loop | |
before socket bind... tcp://*:17351 | |
receiving | |
Traceback (most recent call last): | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 76, in <module> | |
server = tf.train.Server(cluster_spec, job_name=my_job_name, task_index=my_task_index) | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/training/server_lib.py", line 145, in __init__ | |
self._server_def.SerializeToString(), status) | |
File "/net/software/local/python/2.7.9/lib/python2.7/contextlib.py", line 24, in __exit__ | |
self.gen.next() | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status | |
pywrap_tensorflow.TF_GetCode(status)) | |
InvalidArgumentError: Task 2 was not defined in job "worker" | |
Traceback (most recent call last): | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 76, in <module> | |
server = tf.train.Server(cluster_spec, job_name=my_job_name, task_index=my_task_index) | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/training/server_lib.py", line 145, in __init__ | |
self._server_def.SerializeToString(), status) | |
File "/net/software/local/python/2.7.9/lib/python2.7/contextlib.py", line 24, in __exit__ | |
self.gen.next() | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status | |
pywrap_tensorflow.TF_GetCode(status)) | |
InvalidArgumentError: Task 3 was not defined in job "worker" | |
Traceback (most recent call last): | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 76, in <module> | |
server = tf.train.Server(cluster_spec, job_name=my_job_name, task_index=my_task_index) | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/training/server_lib.py", line 145, in __init__ | |
self._server_def.SerializeToString(), status) | |
File "/net/software/local/python/2.7.9/lib/python2.7/contextlib.py", line 24, in __exit__ | |
self.gen.next() | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status | |
pywrap_tensorflow.TF_GetCode(status)) | |
InvalidArgumentError: Task 2 was not defined in job "worker" | |
Traceback (most recent call last): | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 76, in <module> | |
server = tf.train.Server(cluster_spec, job_name=my_job_name, task_index=my_task_index) | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/training/server_lib.py", line 145, in __init__ | |
self._server_def.SerializeToString(), status) | |
File "/net/software/local/python/2.7.9/lib/python2.7/contextlib.py", line 24, in __exit__ | |
self.gen.next() | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status | |
pywrap_tensorflow.TF_GetCode(status)) | |
InvalidArgumentError: Task 3 was not defined in job "worker" | |
Traceback (most recent call last): | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 76, in <module> | |
server = tf.train.Server(cluster_spec, job_name=my_job_name, task_index=my_task_index) | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/training/server_lib.py", line 145, in __init__ | |
self._server_def.SerializeToString(), status) | |
File "/net/software/local/python/2.7.9/lib/python2.7/contextlib.py", line 24, in __exit__ | |
self.gen.next() | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status | |
pywrap_tensorflow.TF_GetCode(status)) | |
InvalidArgumentError: Task 2 was not defined in job "worker" | |
Traceback (most recent call last): | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 76, in <module> | |
server = tf.train.Server(cluster_spec, job_name=my_job_name, task_index=my_task_index) | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/training/server_lib.py", line 145, in __init__ | |
self._server_def.SerializeToString(), status) | |
File "/net/software/local/python/2.7.9/lib/python2.7/contextlib.py", line 24, in __exit__ | |
self.gen.next() | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status | |
pywrap_tensorflow.TF_GetCode(status)) | |
InvalidArgumentError: Task 3 was not defined in job "worker" | |
[0129 16:41:31 @multigpu.py:323] ERR [p1577] step: count(1), step_time 6416.33, mean_step_time 6416.33, it/s 0.16 | |
Traceback (most recent call last): | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 76, in <module> | |
server = tf.train.Server(cluster_spec, job_name=my_job_name, task_index=my_task_index) | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/training/server_lib.py", line 145, in __init__ | |
self._server_def.SerializeToString(), status) | |
File "/net/software/local/python/2.7.9/lib/python2.7/contextlib.py", line 24, in __exit__ | |
self.gen.next() | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status | |
pywrap_tensorflow.TF_GetCode(status)) | |
InvalidArgumentError: Task 2 was not defined in job "worker" | |
Traceback (most recent call last): | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 76, in <module> | |
server = tf.train.Server(cluster_spec, job_name=my_job_name, task_index=my_task_index) | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/training/server_lib.py", line 145, in __init__ | |
self._server_def.SerializeToString(), status) | |
File "/net/software/local/python/2.7.9/lib/python2.7/contextlib.py", line 24, in __exit__ | |
self.gen.next() | |
File "/net/archive/groups/plggluna/adam/a3c_virtualenv/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status | |
pywrap_tensorflow.TF_GetCode(status)) | |
InvalidArgumentError: Task 3 was not defined in job "worker" | |
[2018-01-29 16:41:32,629] Making new env: Breakout-v0 | |
[2018-01-29 16:41:32,747] Making new env: Breakout-v0 | |
{'ps': ['p1567:9236'], 'worker': ['p1568:9237', 'p1577:9237']} | |
[worker:2] Starting the TF server | |
cluster_spec.as_dict(): {'ps': ['p1567:9236'], 'worker': ['p1568:9237', 'p1577:9237']} tf.__version__: 1.2.1 | |
========= EXCEPTION WHILE STARTING TF SERVER [p1580] ===== | |
[worker:2] Starting the TF server | |
cluster_spec.as_dict(): {'ps': ['p1567:9236'], 'worker': ['p1568:9237', 'p1577:9237']} tf.__version__: 1.2.1 | |
========= EXCEPTION WHILE STARTING TF SERVER [p1580] ===== | |
[worker:2] Starting the TF server | |
cluster_spec.as_dict(): {'ps': ['p1567:9236'], 'worker': ['p1568:9237', 'p1577:9237']} tf.__version__: 1.2.1 | |
========= EXCEPTION WHILE STARTING TF SERVER [p1580] ===== | |
[worker:2] Starting the TF server | |
cluster_spec.as_dict(): {'ps': ['p1567:9236'], 'worker': ['p1568:9237', 'p1577:9237']} tf.__version__: 1.2.1 | |
========= EXCEPTION WHILE STARTING TF SERVER [p1580] ===== | |
[worker:2] Starting the TF server | |
cluster_spec.as_dict(): {'ps': ['p1567:9236'], 'worker': ['p1568:9237', 'p1577:9237']} tf.__version__: 1.2.1 | |
========= EXCEPTION WHILE STARTING TF SERVER [p1580] ===== | |
[worker:2] Starting the TF server | |
cluster_spec.as_dict(): {'ps': ['p1567:9236'], 'worker': ['p1568:9237', 'p1577:9237']} tf.__version__: 1.2.1 | |
========= EXCEPTION WHILE STARTING TF SERVER [p1580] ===== | |
[worker:2] Starting the TF server | |
cluster_spec.as_dict(): {'ps': ['p1567:9236'], 'worker': ['p1568:9237', 'p1577:9237']} tf.__version__: 1.2.1 | |
========= EXCEPTION WHILE STARTING TF SERVER [p1580] ===== | |
[worker:2] Starting the TF server | |
cluster_spec.as_dict(): {'ps': ['p1567:9236'], 'worker': ['p1568:9237', 'p1577:9237']} tf.__version__: 1.2.1 | |
========= EXCEPTION WHILE STARTING TF SERVER [p1580] ===== | |
[worker:2] Starting the TF server | |
cluster_spec.as_dict(): {'ps': ['p1567:9236'], 'worker': ['p1568:9237', 'p1577:9237']} tf.__version__: 1.2.1 | |
========= EXCEPTION WHILE STARTING TF SERVER [p1580] ===== | |
[worker:2] Starting the TF server | |
cluster_spec.as_dict(): {'ps': ['p1567:9236'], 'worker': ['p1568:9237', 'p1577:9237']} tf.__version__: 1.2.1 | |
========= EXCEPTION WHILE STARTING TF SERVER [p1580] ===== | |
args.mkl == 0 | |
using tensorflow convolution | |
{'ps': ['p1567:9236'], 'worker': ['p1568:9237', 'p1577:9237']} | |
[worker:3] Starting the TF server | |
cluster_spec.as_dict(): {'ps': ['p1567:9236'], 'worker': ['p1568:9237', 'p1577:9237']} tf.__version__: 1.2.1 | |
========= EXCEPTION WHILE STARTING TF SERVER [p1584] ===== | |
[worker:3] Starting the TF server | |
cluster_spec.as_dict(): {'ps': ['p1567:9236'], 'worker': ['p1568:9237', 'p1577:9237']} tf.__version__: 1.2.1 | |
========= EXCEPTION WHILE STARTING TF SERVER [p1584] ===== | |
[worker:3] Starting the TF server | |
cluster_spec.as_dict(): {'ps': ['p1567:9236'], 'worker': ['p1568:9237', 'p1577:9237']} tf.__version__: 1.2.1 | |
========= EXCEPTION WHILE STARTING TF SERVER [p1584] ===== | |
[worker:3] Starting the TF server | |
cluster_spec.as_dict(): {'ps': ['p1567:9236'], 'worker': ['p1568:9237', 'p1577:9237']} tf.__version__: 1.2.1 | |
========= EXCEPTION WHILE STARTING TF SERVER [p1584] ===== | |
[worker:3] Starting the TF server | |
cluster_spec.as_dict(): {'ps': ['p1567:9236'], 'worker': ['p1568:9237', 'p1577:9237']} tf.__version__: 1.2.1 | |
========= EXCEPTION WHILE STARTING TF SERVER [p1584] ===== | |
[worker:3] Starting the TF server | |
cluster_spec.as_dict(): {'ps': ['p1567:9236'], 'worker': ['p1568:9237', 'p1577:9237']} tf.__version__: 1.2.1 | |
========= EXCEPTION WHILE STARTING TF SERVER [p1584] ===== | |
[worker:3] Starting the TF server | |
cluster_spec.as_dict(): {'ps': ['p1567:9236'], 'worker': ['p1568:9237', 'p1577:9237']} tf.__version__: 1.2.1 | |
========= EXCEPTION WHILE STARTING TF SERVER [p1584] ===== | |
[worker:3] Starting the TF server | |
cluster_spec.as_dict(): {'ps': ['p1567:9236'], 'worker': ['p1568:9237', 'p1577:9237']} tf.__version__: 1.2.1 | |
========= EXCEPTION WHILE STARTING TF SERVER [p1584] ===== | |
[worker:3] Starting the TF server | |
cluster_spec.as_dict(): {'ps': ['p1567:9236'], 'worker': ['p1568:9237', 'p1577:9237']} tf.__version__: 1.2.1 | |
========= EXCEPTION WHILE STARTING TF SERVER [p1584] ===== | |
[worker:3] Starting the TF server | |
cluster_spec.as_dict(): {'ps': ['p1567:9236'], 'worker': ['p1568:9237', 'p1577:9237']} tf.__version__: 1.2.1 | |
========= EXCEPTION WHILE STARTING TF SERVER [p1584] ===== | |
args.mkl == 0 | |
using tensorflow convolution | |
[0129 16:41:33 @multigpu.py:323] ERR [p1577] step: count(2), step_time 2272.23, mean_step_time 4344.28, it/s 0.23 | |
[2018-01-29 16:41:33,917] Making new env: Breakout-v0 | |
[2018-01-29 16:41:33,919] Making new env: Breakout-v0 | |
[2018-01-29 16:41:33,922] Making new env: Breakout-v0 | |
[2018-01-29 16:41:33,924] Making new env: Breakout-v0 | |
[2018-01-29 16:41:33,928] Making new env: Breakout-v0 | |
[2018-01-29 16:41:33,931] Making new env: Breakout-v0 | |
[2018-01-29 16:41:33,934] Making new env: Breakout-v0 | |
[2018-01-29 16:41:33,936] Making new env: Breakout-v0 | |
[2018-01-29 16:41:33,940] Making new env: Breakout-v0 | |
[2018-01-29 16:41:33,943] Making new env: Breakout-v0 | |
[2018-01-29 16:41:33,945] Making new env: Breakout-v0 | |
[2018-01-29 16:41:33,952] Making new env: Breakout-v0 | |
[2018-01-29 16:41:33,958] Making new env: Breakout-v0 | |
[2018-01-29 16:41:33,959] Making new env: Breakout-v0 | |
[2018-01-29 16:41:33,958] Making new env: Breakout-v0 | |
[2018-01-29 16:41:33,964] Making new env: Breakout-v0 | |
[2018-01-29 16:41:33,967] Making new env: Breakout-v0 | |
[2018-01-29 16:41:33,970] Making new env: Breakout-v0 | |
[2018-01-29 16:41:33,972] Making new env: Breakout-v0 | |
[2018-01-29 16:41:33,976] Making new env: Breakout-v0 | |
[2018-01-29 16:41:33,979] Making new env: Breakout-v0 | |
[2018-01-29 16:41:33,983] Making new env: Breakout-v0 | |
[2018-01-29 16:41:33,985] Making new env: Breakout-v0 | |
[2018-01-29 16:41:33,989] Making new env: Breakout-v0 | |
[2018-01-29 16:41:33,991] Making new env: Breakout-v0 | |
[2018-01-29 16:41:33,995] Making new env: Breakout-v0 | |
[2018-01-29 16:41:33,998] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,002] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,006] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,008] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,013] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,015] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,019] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,021] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,026] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,027] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,032] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,034] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,039] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,043] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,045] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,047] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,052] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,056] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,058] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,061] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,065] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,068] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,070] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,075] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,077] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,082] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,088] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,089] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,090] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,095] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,096] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,101] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,103] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,109] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,109] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,115] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,116] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,122] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,123] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,128] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,130] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,135] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,137] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,141] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,145] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,148] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,151] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,154] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,158] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,161] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,165] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,168] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,172] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,174] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,178] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,180] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,185] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,187] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,192] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,194] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,199] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,200] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,206] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,207] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,213] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,214] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,220] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,221] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,227] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,227] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,234] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,234] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,241] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,241] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,247] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,248] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,254] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,255] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,261] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,262] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,268] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,269] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,274] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,276] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,281] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,283] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,288] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,290] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,294] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,297] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,301] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,304] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,308] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,311] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,315] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,319] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,322] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,326] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,329] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,333] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,335] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,340] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,342] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,347] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,349] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,354] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,356] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,362] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,363] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,369] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,370] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,376] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,377] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,383] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,384] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,390] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,391] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,398] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,398] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,405] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,405] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,412] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,412] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,419] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,420] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,427] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,427] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,433] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,434] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,440] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,442] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,447] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,449] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,454] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,456] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,462] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,463] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,469] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,470] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,476] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,477] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,483] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,485] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,490] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,492] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,498] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,500] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,505] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,507] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,512] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,514] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,519] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,522] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,526] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,530] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,534] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,537] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,541] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,545] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,548] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,552] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,556] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,560] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,563] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,567] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,570] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,574] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,577] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,582] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,589] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,597] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,604] Making new env: Breakout-v0 | |
None <type 'NoneType'> | |
Traceback (most recent call last): | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 712, in <module> | |
config = get_config(args, is_chief, my_task_index, chief_worker_hostname, len(cluster['worker'])) | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 571, in get_config | |
'worker_host' : server.target, | |
NameError: global name 'server' is not defined | |
[2018-01-29 16:41:34,612] Making new env: Breakout-v0 | |
[2018-01-29 16:41:34,620] Making new env: Breakout-v0 | |
None <type 'NoneType'> | |
Traceback (most recent call last): | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 712, in <module> | |
config = get_config(args, is_chief, my_task_index, chief_worker_hostname, len(cluster['worker'])) | |
File "/net/archive/groups/plggluna/adam/Distributed-BA3C/src/OpenAIGym//train.py", line 571, in get_config | |
'worker_host' : server.target, | |
NameError: global name 'server' is not defined | |
[0129 16:41:34 @multigpu.py:323] ERR [p1577] step: count(3), step_time 1160.77, mean_step_time 3283.11, it/s 0.3 | |
DONE | |
DONE | |
[0129 16:41:36 @multigpu.py:323] ERR [p1577] step: count(4), step_time 1182.74, mean_step_time 2758.02, it/s 0.36 | |
[0129 16:41:37 @multigpu.py:323] ERR [p1577] step: count(5), step_time 1161.0, mean_step_time 2438.61, it/s 0.41 | |
[0129 16:41:39 @multigpu.py:323] ERR [p1577] step: count(6), step_time 2177.08, mean_step_time 2395.02, it/s 0.42 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment