Skip to content

Instantly share code, notes, and snippets.

@byronyi
Created January 17, 2019 04:40
Show Gist options
  • Save byronyi/ca1b55e5a5423d5b3abb9efc6fd34b80 to your computer and use it in GitHub Desktop.
Save byronyi/ca1b55e5a5423d5b3abb9efc6fd34b80 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@threeleafzerg
Copy link

byronyi,
I tried you scirpts and steps exactly same.
The worker script (#3 ) always stuck with following log : (any ideas? )
(tf-estimator-nightly 1.13.0.dev2019010910 tf-nightly 1.13.0.dev20190116)
Log:
[zhouhaiy@mlt-skx052 temp]$ python worker.py
2019-01-17 15:28:47.684458: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-01-17 15:28:47.699155: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2500000000 Hz
2019-01-17 15:28:47.709979: I tensorflow/compiler/xla/service/service.cc:162] XLA service 0x7e17fa0 executing computations on platfo rm Host. Devices:
2019-01-17 15:28:47.710022: I tensorflow/compiler/xla/service/service.cc:169] StreamExecutor device (0): ,
WARNING:tensorflow:Not all devices in tf.distribute.Strategy are visible to TensorFlow.
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpQBujoJ
WARNING:tensorflow:Not all devices in tf.distribute.Strategy are visible to TensorFlow.
WARNING:tensorflow:Not all devices in tf.distribute.Strategy are visible to TensorFlow.
WARNING:tensorflow:From /home/zhouhaiy/.local/lib/python2.7/site-packages/tensorflow/python/data/ops/dataset_ops.py:1763: make_initi alizable_iterator (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use for ... in dataset: to iterate over a dataset. If using tf.estimator, return the Dataset object directly from your input f unction. As a last resort, you can use tf.compat.v1.data.make_initializable_iterator(dataset).
WARNING:tensorflow:From /home/zhouhaiy/.local/lib/python2.7/site-packages/tensorflow/python/data/ops/dataset_ops.py:1458: colocate_w ith (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /home/zhouhaiy/.local/lib/python2.7/site-packages/tensorflow/python/ops/init_ops.py:1253: calling init ( from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor

@threeleafzerg
Copy link

I also tried these three script in my develop machine. (a rather clean environment)
The #3 script still fail with error. Does this feature needs any pre-requisite packages?
Log from #3:
WARNING:tensorflow:Not all devices in tf.distribute.Strategy are visible to TensorFlow.
WARNING:tensorflow:Not all devices in tf.distribute.Strategy are visible to TensorFlow.
WARNING:tensorflow:From /home/sunbear/miniconda2/lib/python2.7/site-packages/tensorflow/python/data/ops/dataset_ops.py:1763: make_initializable_iterator (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use for ... in dataset: to iterate over a dataset. If using tf.estimator, return the Dataset object directly from your input function. As a last resort, you can use tf.compat.v1.data.make_initializable_iterator(dataset).
WARNING:tensorflow:From /home/sunbear/miniconda2/lib/python2.7/site-packages/tensorflow/python/data/ops/dataset_ops.py:1458: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /home/sunbear/miniconda2/lib/python2.7/site-packages/tensorflow/python/ops/init_ops.py:1253: calling init (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
E0117 19:12:52.497343637 4517 http_proxy.cc:62] 'https' scheme not supported in proxy URI
E0117 19:12:53.510391454 4515 http_proxy.cc:62] 'https' scheme not supported in proxy URI
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.

Log from #1 or #2:
2019-01-17 19:12:29.215574: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-01-17 19:12:29.237936: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3600000000 Hz
2019-01-17 19:12:29.238280: I tensorflow/compiler/xla/service/service.cc:162] XLA service 0x55a4bafcda60 executing computations on platform Host. Devices:
2019-01-17 19:12:29.238303: I tensorflow/compiler/xla/service/service.cc:169] StreamExecutor device (0): ,
2019-01-17 19:12:29.239407: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:250] Initialize GrpcChannelCache for job worker -> {0 -> localhost:6000, 1 -> localhost:6001}
2019-01-17 19:12:29.240302: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:387] Started server with target: grpc://localhost:6000
2019-01-17 19:12:29.240321: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:391] Server already started (target: grpc://localhost:6000)
2019-01-17 19:12:29.240340: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:391] Server already started (target: grpc://localhost:6000)
2019-01-17 19:12:53.513087: I tensorflow/core/distributed_runtime/master_session.cc:1192] Start master session 5a3ad4f61cec82ac with config: device_filters: "/job:worker/task:0" device_filters: "/job:worker/task:0" allow_soft_placement: true graph_options { rewrite_options { meta_optimizer_iterations: ONE scoped_allocator_optimization: ON scoped_allocator_opts { enable_op: "CollectiveReduce" enable_op: "CollectiveReduce" } } } experimental { collective_group_leader: "/job:worker/replica:0/task:0" }
E0117 19:12:53.525456321 4578 http_proxy.cc:62] 'https' scheme not supported in proxy URI
E0117 19:12:53.525458685 4577 http_proxy.cc:62] 'https' scheme not supported in proxy URI
E0117 19:12:53.525475494 4579 http_proxy.cc:62] 'https' scheme not supported in proxy URI
2019-01-17 19:12:53.701830: W tensorflow/core/common_runtime/base_collective_executor.cc:203] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
[[{{node IteratorGetNext}}]]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment