Skip to content

Instantly share code, notes, and snippets.

@taylanbil
Created September 13, 2019 00:39
Show Gist options
  • Save taylanbil/35218dca26ab8f084ea6b85fb7f08fd1 to your computer and use it in GitHub Desktop.
Save taylanbil/35218dca26ab8f084ea6b85fb7f08fd1 to your computer and use it in GitHub Desktop.
Resnet50 run logs w/ multiprocess, fake data
(pytorch-nightly) pytorcx-xla-img :: pytorch/xla/test ‹master*› » XLA_USE_BF16=1 XRT_TPU_CONFIG="tpu_worker;0;10.1.3.2:8470" python test_train_mp_imagenet.py --fake_data --model resnet50
2019-09-13 00:35:30.673264: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) CPU:0 -> /job:tpu_worker/replica:0/task:0/device:XLA_CPU:0
2019-09-13 00:35:30.673318: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (LOCAL) TPU:0 -> /job:tpu_worker/replica:0/task:0/device:TPU:0
2019-09-13 00:35:30.673326: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:1 -> /job:tpu_worker/replica:0/task:0/device:TPU:1
2019-09-13 00:35:30.673332: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:2 -> /job:tpu_worker/replica:0/task:0/device:TPU:2
2019-09-13 00:35:30.673338: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:3 -> /job:tpu_worker/replica:0/task:0/device:TPU:3
2019-09-13 00:35:30.673344: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:4 -> /job:tpu_worker/replica:0/task:0/device:TPU:4
2019-09-13 00:35:30.673350: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:5 -> /job:tpu_worker/replica:0/task:0/device:TPU:5
2019-09-13 00:35:30.673355: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:6 -> /job:tpu_worker/replica:0/task:0/device:TPU:6
2019-09-13 00:35:30.673372: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:7 -> /job:tpu_worker/replica:0/task:0/device:TPU:7
2019-09-13 00:35:30.673404: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:201] Worker grpc://10.1.3.2:8470 for /job:tpu_worker/replica:0/task:0
2019-09-13 00:35:30.673414: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:205] XRT default device: TPU:0
2019-09-13 00:35:30.673430: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:1086] Configuring TPU for master worker tpu_worker:0 at grpc://10.1.3.2:8470
2019-09-13 00:35:30.673537: I tensorflow/compiler/xla/xla_client/mesh_service.cc:168] Waiting to connect to client mesh master (300 seconds) localhost:53857
2019-09-13 00:35:30.682521: I tensorflow/compiler/xla/xla_client/mesh_service.cc:168] Waiting to connect to client mesh master (300 seconds) localhost:53857
2019-09-13 00:35:30.682535: I tensorflow/compiler/xla/xla_client/mesh_service.cc:168] Waiting to connect to client mesh master (300 seconds) localhost:53857
2019-09-13 00:35:30.682770: I tensorflow/compiler/xla/xla_client/mesh_service.cc:168] Waiting to connect to client mesh master (300 seconds) localhost:53857
2019-09-13 00:35:30.682854: I tensorflow/compiler/xla/xla_client/mesh_service.cc:168] Waiting to connect to client mesh master (300 seconds) localhost:53857
2019-09-13 00:35:30.694450: I tensorflow/compiler/xla/xla_client/mesh_service.cc:168] Waiting to connect to client mesh master (300 seconds) localhost:53857
2019-09-13 00:35:30.754342: I tensorflow/compiler/xla/xla_client/mesh_service.cc:168] Waiting to connect to client mesh master (300 seconds) localhost:53857
2019-09-13 00:35:36.300990: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:1097] TPU topology: mesh_shape: 2
mesh_shape: 2
mesh_shape: 2
num_tasks: 1
num_tpu_devices_per_task: 8
device_coordinates: 0
device_coordinates: 0
device_coordinates: 0
device_coordinates: 0
device_coordinates: 0
device_coordinates: 1
device_coordinates: 0
device_coordinates: 1
device_coordinates: 0
device_coordinates: 0
device_coordinates: 1
device_coordinates: 1
device_coordinates: 1
device_coordinates: 0
device_coordinates: 0
device_coordinates: 1
device_coordinates: 0
device_coordinates: 1
device_coordinates: 1
device_coordinates: 1
device_coordinates: 0
device_coordinates: 1
device_coordinates: 1
device_coordinates: 1
2019-09-13 00:35:36.301093: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:1166] Creating mesh service bound to localhost:53857
==> Preparing data..
2019-09-13 00:35:36.390049: I tensorflow/compiler/xla/xla_client/computation_client.cc:168] Fetching mesh configuration for worker tpu_worker:0 from mesh service at localhost:53857
2019-09-13 00:35:36.390731: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) CPU:0 -> /job:tpu_worker/replica:0/task:0/device:XLA_CPU:0
2019-09-13 00:35:36.390762: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:0 -> /job:tpu_worker/replica:0/task:0/device:TPU:0
2019-09-13 00:35:36.390769: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:1 -> /job:tpu_worker/replica:0/task:0/device:TPU:1
2019-09-13 00:35:36.390774: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:2 -> /job:tpu_worker/replica:0/task:0/device:TPU:2
2019-09-13 00:35:36.390780: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:3 -> /job:tpu_worker/replica:0/task:0/device:TPU:3
2019-09-13 00:35:36.390790: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:4 -> /job:tpu_worker/replica:0/task:0/device:TPU:4
2019-09-13 00:35:36.390796: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (LOCAL) TPU:5 -> /job:tpu_worker/replica:0/task:0/device:TPU:5
2019-09-13 00:35:36.390803: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:6 -> /job:tpu_worker/replica:0/task:0/device:TPU:6
2019-09-13 00:35:36.390808: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:7 -> /job:tpu_worker/replica:0/task:0/device:TPU:7
2019-09-13 00:35:36.390817: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:201] Worker grpc://10.1.3.2:8470 for /job:tpu_worker/replica:0/task:0
2019-09-13 00:35:36.390852: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:205] XRT default device: TPU:5
==> Preparing data..
2019-09-13 00:35:36.402814: I tensorflow/compiler/xla/xla_client/computation_client.cc:168] Fetching mesh configuration for worker tpu_worker:0 from mesh service at localhost:53857
2019-09-13 00:35:36.403352: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) CPU:0 -> /job:tpu_worker/replica:0/task:0/device:XLA_CPU:0
2019-09-13 00:35:36.403382: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:0 -> /job:tpu_worker/replica:0/task:0/device:TPU:0
2019-09-13 00:35:36.403390: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:1 -> /job:tpu_worker/replica:0/task:0/device:TPU:1
2019-09-13 00:35:36.403405: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:2 -> /job:tpu_worker/replica:0/task:0/device:TPU:2
2019-09-13 00:35:36.403419: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:3 -> /job:tpu_worker/replica:0/task:0/device:TPU:3
2019-09-13 00:35:36.403520: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:4 -> /job:tpu_worker/replica:0/task:0/device:TPU:4
2019-09-13 00:35:36.403530: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:5 -> /job:tpu_worker/replica:0/task:0/device:TPU:5
2019-09-13 00:35:36.403543: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (LOCAL) TPU:6 -> /job:tpu_worker/replica:0/task:0/device:TPU:6
2019-09-13 00:35:36.403553: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:7 -> /job:tpu_worker/replica:0/task:0/device:TPU:7
2019-09-13 00:35:36.403569: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:201] Worker grpc://10.1.3.2:8470 for /job:tpu_worker/replica:0/task:0
2019-09-13 00:35:36.403580: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:205] XRT default device: TPU:6
==> Preparing data..
2019-09-13 00:35:36.432060: I tensorflow/compiler/xla/xla_client/computation_client.cc:168] Fetching mesh configuration for worker tpu_worker:0 from mesh service at localhost:53857
2019-09-13 00:35:36.432667: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) CPU:0 -> /job:tpu_worker/replica:0/task:0/device:XLA_CPU:0
2019-09-13 00:35:36.432707: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:0 -> /job:tpu_worker/replica:0/task:0/device:TPU:0
2019-09-13 00:35:36.432725: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (LOCAL) TPU:1 -> /job:tpu_worker/replica:0/task:0/device:TPU:1
2019-09-13 00:35:36.432743: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:2 -> /job:tpu_worker/replica:0/task:0/device:TPU:2
2019-09-13 00:35:36.432759: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:3 -> /job:tpu_worker/replica:0/task:0/device:TPU:3
2019-09-13 00:35:36.432775: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:4 -> /job:tpu_worker/replica:0/task:0/device:TPU:4
2019-09-13 00:35:36.432792: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:5 -> /job:tpu_worker/replica:0/task:0/device:TPU:5
2019-09-13 00:35:36.432807: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:6 -> /job:tpu_worker/replica:0/task:0/device:TPU:6
2019-09-13 00:35:36.432829: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:7 -> /job:tpu_worker/replica:0/task:0/device:TPU:7
2019-09-13 00:35:36.432847: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:201] Worker grpc://10.1.3.2:8470 for /job:tpu_worker/replica:0/task:0
2019-09-13 00:35:36.432864: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:205] XRT default device: TPU:1
==> Preparing data..
2019-09-13 00:35:37.071081: I torch_xla/csrc/tensor_util.cpp:27] Using BF16 data type for floating point values
2019-09-13 00:35:37.173304: I torch_xla/csrc/tensor_util.cpp:27] Using BF16 data type for floating point values
2019-09-13 00:35:37.403675: I torch_xla/csrc/tensor_util.cpp:27] Using BF16 data type for floating point values
2019-09-13 00:35:37.449382: I torch_xla/csrc/tensor_util.cpp:27] Using BF16 data type for floating point values
2019-09-13 00:35:39.123780: I tensorflow/compiler/xla/xla_client/computation_client.cc:168] Fetching mesh configuration for worker tpu_worker:0 from mesh service at localhost:53857
2019-09-13 00:35:39.124340: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) CPU:0 -> /job:tpu_worker/replica:0/task:0/device:XLA_CPU:0
2019-09-13 00:35:39.124378: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:0 -> /job:tpu_worker/replica:0/task:0/device:TPU:0
2019-09-13 00:35:39.124385: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:1 -> /job:tpu_worker/replica:0/task:0/device:TPU:1
2019-09-13 00:35:39.124393: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:2 -> /job:tpu_worker/replica:0/task:0/device:TPU:2
2019-09-13 00:35:39.124399: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (LOCAL) TPU:3 -> /job:tpu_worker/replica:0/task:0/device:TPU:3
2019-09-13 00:35:39.124404: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:4 -> /job:tpu_worker/replica:0/task:0/device:TPU:4
2019-09-13 00:35:39.124417: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:5 -> /job:tpu_worker/replica:0/task:0/device:TPU:5
2019-09-13 00:35:39.124428: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:6 -> /job:tpu_worker/replica:0/task:0/device:TPU:6
2019-09-13 00:35:39.124440: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:7 -> /job:tpu_worker/replica:0/task:0/device:TPU:7
2019-09-13 00:35:39.124451: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:201] Worker grpc://10.1.3.2:8470 for /job:tpu_worker/replica:0/task:0
2019-09-13 00:35:39.124459: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:205] XRT default device: TPU:3
==> Preparing data..
2019-09-13 00:35:39.213332: I tensorflow/compiler/xla/xla_client/computation_client.cc:168] Fetching mesh configuration for worker tpu_worker:0 from mesh service at localhost:53857
2019-09-13 00:35:39.213744: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) CPU:0 -> /job:tpu_worker/replica:0/task:0/device:XLA_CPU:0
2019-09-13 00:35:39.213775: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:0 -> /job:tpu_worker/replica:0/task:0/device:TPU:0
2019-09-13 00:35:39.213783: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:1 -> /job:tpu_worker/replica:0/task:0/device:TPU:1
2019-09-13 00:35:39.213796: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:2 -> /job:tpu_worker/replica:0/task:0/device:TPU:2
2019-09-13 00:35:39.213807: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:3 -> /job:tpu_worker/replica:0/task:0/device:TPU:3
2019-09-13 00:35:39.213813: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:4 -> /job:tpu_worker/replica:0/task:0/device:TPU:4
2019-09-13 00:35:39.213820: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:5 -> /job:tpu_worker/replica:0/task:0/device:TPU:5
2019-09-13 00:35:39.213830: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:6 -> /job:tpu_worker/replica:0/task:0/device:TPU:6
2019-09-13 00:35:39.213842: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (LOCAL) TPU:7 -> /job:tpu_worker/replica:0/task:0/device:TPU:7
2019-09-13 00:35:39.213848: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:201] Worker grpc://10.1.3.2:8470 for /job:tpu_worker/replica:0/task:0
2019-09-13 00:35:39.213855: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:205] XRT default device: TPU:7
==> Preparing data..
2019-09-13 00:35:39.601004: I tensorflow/compiler/xla/xla_client/computation_client.cc:168] Fetching mesh configuration for worker tpu_worker:0 from mesh service at localhost:53857
2019-09-13 00:35:39.601503: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) CPU:0 -> /job:tpu_worker/replica:0/task:0/device:XLA_CPU:0
2019-09-13 00:35:39.601530: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:0 -> /job:tpu_worker/replica:0/task:0/device:TPU:0
2019-09-13 00:35:39.601537: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:1 -> /job:tpu_worker/replica:0/task:0/device:TPU:1
2019-09-13 00:35:39.601550: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:2 -> /job:tpu_worker/replica:0/task:0/device:TPU:2
2019-09-13 00:35:39.601570: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:3 -> /job:tpu_worker/replica:0/task:0/device:TPU:3
2019-09-13 00:35:39.601576: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (LOCAL) TPU:4 -> /job:tpu_worker/replica:0/task:0/device:TPU:4
2019-09-13 00:35:39.601582: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:5 -> /job:tpu_worker/replica:0/task:0/device:TPU:5
2019-09-13 00:35:39.601587: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:6 -> /job:tpu_worker/replica:0/task:0/device:TPU:6
2019-09-13 00:35:39.601594: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:7 -> /job:tpu_worker/replica:0/task:0/device:TPU:7
2019-09-13 00:35:39.601604: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:201] Worker grpc://10.1.3.2:8470 for /job:tpu_worker/replica:0/task:0
2019-09-13 00:35:39.601613: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:205] XRT default device: TPU:4
==> Preparing data..
2019-09-13 00:35:39.667707: I torch_xla/csrc/tensor_util.cpp:27] Using BF16 data type for floating point values
2019-09-13 00:35:39.742836: I torch_xla/csrc/tensor_util.cpp:27] Using BF16 data type for floating point values
2019-09-13 00:35:40.136970: I torch_xla/csrc/tensor_util.cpp:27] Using BF16 data type for floating point values
2019-09-13 00:35:40.757767: I tensorflow/compiler/xla/xla_client/computation_client.cc:168] Fetching mesh configuration for worker tpu_worker:0 from mesh service at localhost:53857
2019-09-13 00:35:40.758199: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) CPU:0 -> /job:tpu_worker/replica:0/task:0/device:XLA_CPU:0
2019-09-13 00:35:40.758224: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:0 -> /job:tpu_worker/replica:0/task:0/device:TPU:0
2019-09-13 00:35:40.758231: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:1 -> /job:tpu_worker/replica:0/task:0/device:TPU:1
2019-09-13 00:35:40.758245: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (LOCAL) TPU:2 -> /job:tpu_worker/replica:0/task:0/device:TPU:2
2019-09-13 00:35:40.758281: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:3 -> /job:tpu_worker/replica:0/task:0/device:TPU:3
2019-09-13 00:35:40.758287: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:4 -> /job:tpu_worker/replica:0/task:0/device:TPU:4
2019-09-13 00:35:40.758294: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:5 -> /job:tpu_worker/replica:0/task:0/device:TPU:5
2019-09-13 00:35:40.758302: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:6 -> /job:tpu_worker/replica:0/task:0/device:TPU:6
2019-09-13 00:35:40.758311: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:7 -> /job:tpu_worker/replica:0/task:0/device:TPU:7
2019-09-13 00:35:40.758321: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:201] Worker grpc://10.1.3.2:8470 for /job:tpu_worker/replica:0/task:0
2019-09-13 00:35:40.758329: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:205] XRT default device: TPU:2
==> Preparing data..
2019-09-13 00:35:41.312188: I torch_xla/csrc/tensor_util.cpp:27] Using BF16 data type for floating point values
[xla:0](0) Loss=6.87500 Rate=14.19 GlobalRate=14.19 Time=Fri Sep 13 00:35:46 2019
[xla:0](0) Loss=6.87500 Rate=18.78 GlobalRate=18.78 Time=Fri Sep 13 00:35:46 2019
[xla:0](0) Loss=6.87500 Rate=20.21 GlobalRate=20.21 Time=Fri Sep 13 00:35:46 2019
[xla:0](0) Loss=6.87500 Rate=14.31 GlobalRate=14.31 Time=Fri Sep 13 00:35:46 2019
[xla:0](0) Loss=6.87500 Rate=24.33 GlobalRate=24.33 Time=Fri Sep 13 00:35:46 2019
[xla:0](0) Loss=6.87500 Rate=18.87 GlobalRate=18.87 Time=Fri Sep 13 00:35:46 2019
[xla:0](0) Loss=6.87500 Rate=13.50 GlobalRate=13.50 Time=Fri Sep 13 00:35:46 2019
[xla:0](0) Loss=6.87500 Rate=13.59 GlobalRate=13.59 Time=Fri Sep 13 00:35:46 2019
[xla:0](20) Loss=0.03076 Rate=32.48 GlobalRate=39.33 Time=Fri Sep 13 00:36:48 2019
[xla:0](20) Loss=0.03076 Rate=30.64 GlobalRate=38.10 Time=Fri Sep 13 00:36:48 2019
[xla:0](20) Loss=0.03076 Rate=34.72 GlobalRate=40.28 Time=Fri Sep 13 00:36:48 2019
[xla:0](20) Loss=0.03076 Rate=30.46 GlobalRate=37.97 Time=Fri Sep 13 00:36:48 2019
[xla:0](20) Loss=0.03076 Rate=32.53 GlobalRate=39.38 Time=Fri Sep 13 00:36:48 2019
[xla:0](20) Loss=0.03076 Rate=30.69 GlobalRate=38.15 Time=Fri Sep 13 00:36:48 2019
[xla:0](20) Loss=0.03076 Rate=33.04 GlobalRate=39.60 Time=Fri Sep 13 00:36:48 2019
[xla:0](20) Loss=0.03076 Rate=30.38 GlobalRate=37.88 Time=Fri Sep 13 00:36:48 2019
[xla:0](40) Loss=0.00000 Rate=372.71 GlobalRate=72.28 Time=Fri Sep 13 00:36:52 2019
[xla:0](40) Loss=0.00000 Rate=374.62 GlobalRate=69.79 Time=Fri Sep 13 00:36:52 2019
[xla:0](40) Loss=0.00000 Rate=373.53 GlobalRate=70.24 Time=Fri Sep 13 00:36:52 2019
[xla:0](40) Loss=0.00000 Rate=373.87 GlobalRate=72.37 Time=Fri Sep 13 00:36:52 2019
[xla:0](40) Loss=0.00000 Rate=374.64 GlobalRate=73.92 Time=Fri Sep 13 00:36:52 2019
[xla:0](40) Loss=0.00000 Rate=372.89 GlobalRate=69.92 Time=Fri Sep 13 00:36:52 2019
[xla:0](40) Loss=0.00000 Rate=373.97 GlobalRate=72.75 Time=Fri Sep 13 00:36:52 2019
[xla:0](40) Loss=0.00000 Rate=370.46 GlobalRate=70.13 Time=Fri Sep 13 00:36:52 2019
[xla:0](60) Loss=0.00000 Rate=507.22 GlobalRate=98.70 Time=Fri Sep 13 00:36:56 2019
[xla:0](60) Loss=0.00000 Rate=507.39 GlobalRate=101.66 Time=Fri Sep 13 00:36:56 2019
[xla:0](60) Loss=0.00000 Rate=506.14 GlobalRate=98.80 Time=Fri Sep 13 00:36:56 2019
[xla:0](60) Loss=0.00000 Rate=505.82 GlobalRate=98.20 Time=Fri Sep 13 00:36:56 2019
[xla:0](60) Loss=0.00000 Rate=506.11 GlobalRate=102.14 Time=Fri Sep 13 00:36:56 2019
[xla:0](60) Loss=0.00000 Rate=505.45 GlobalRate=98.38 Time=Fri Sep 13 00:36:56 2019
[xla:0](60) Loss=0.00000 Rate=504.15 GlobalRate=101.49 Time=Fri Sep 13 00:36:56 2019
[xla:0](60) Loss=0.00000 Rate=503.76 GlobalRate=103.65 Time=Fri Sep 13 00:36:56 2019
[xla:0](80) Loss=0.00000 Rate=547.34 GlobalRate=124.20 Time=Fri Sep 13 00:37:01 2019
[xla:0](80) Loss=0.00000 Rate=547.60 GlobalRate=128.17 Time=Fri Sep 13 00:37:01 2019
[xla:0](80) Loss=0.00000 Rate=546.80 GlobalRate=123.69 Time=Fri Sep 13 00:37:01 2019
[xla:0](80) Loss=0.00000 Rate=546.35 GlobalRate=123.46 Time=Fri Sep 13 00:37:01 2019
[xla:0](80) Loss=0.00000 Rate=548.32 GlobalRate=129.99 Time=Fri Sep 13 00:37:01 2019
[xla:0](80) Loss=0.00000 Rate=545.58 GlobalRate=124.03 Time=Fri Sep 13 00:37:01 2019
[xla:0](80) Loss=0.00000 Rate=546.27 GlobalRate=127.39 Time=Fri Sep 13 00:37:01 2019
[xla:0](80) Loss=0.00000 Rate=545.77 GlobalRate=127.55 Time=Fri Sep 13 00:37:01 2019
[xla:0](100) Loss=0.00000 Rate=566.80 GlobalRate=150.67 Time=Fri Sep 13 00:37:05 2019
[xla:0](100) Loss=0.00000 Rate=566.58 GlobalRate=150.86 Time=Fri Sep 13 00:37:05 2019
[xla:0](100) Loss=0.00000 Rate=566.42 GlobalRate=146.90 Time=Fri Sep 13 00:37:05 2019
[xla:0](100) Loss=0.00000 Rate=565.96 GlobalRate=146.25 Time=Fri Sep 13 00:37:05 2019
[xla:0](100) Loss=0.00000 Rate=566.67 GlobalRate=153.57 Time=Fri Sep 13 00:37:05 2019
[xla:0](100) Loss=0.00000 Rate=564.80 GlobalRate=146.47 Time=Fri Sep 13 00:37:05 2019
[xla:0](100) Loss=0.00000 Rate=564.43 GlobalRate=151.49 Time=Fri Sep 13 00:37:05 2019
[xla:0](100) Loss=0.00000 Rate=563.91 GlobalRate=147.02 Time=Fri Sep 13 00:37:05 2019
[xla:0](120) Loss=0.00000 Rate=565.64 GlobalRate=167.38 Time=Fri Sep 13 00:37:10 2019
[xla:0](120) Loss=0.00000 Rate=564.90 GlobalRate=174.56 Time=Fri Sep 13 00:37:10 2019
[xla:0](120) Loss=0.00000 Rate=564.82 GlobalRate=167.51 Time=Fri Sep 13 00:37:10 2019
[xla:0](120) Loss=0.00000 Rate=564.09 GlobalRate=171.42 Time=Fri Sep 13 00:37:10 2019
[xla:0](120) Loss=0.00000 Rate=564.35 GlobalRate=166.64 Time=Fri Sep 13 00:37:10 2019
[xla:0](120) Loss=0.00000 Rate=565.15 GlobalRate=166.92 Time=Fri Sep 13 00:37:10 2019
[xla:0](120) Loss=0.00000 Rate=563.95 GlobalRate=171.61 Time=Fri Sep 13 00:37:10 2019
[xla:0](120) Loss=0.00000 Rate=564.93 GlobalRate=172.34 Time=Fri Sep 13 00:37:10 2019
[xla:0](140) Loss=0.00000 Rate=577.35 GlobalRate=190.54 Time=Fri Sep 13 00:37:14 2019
[xla:0](140) Loss=0.00000 Rate=577.39 GlobalRate=185.47 Time=Fri Sep 13 00:37:14 2019
[xla:0](140) Loss=0.00000 Rate=576.71 GlobalRate=191.49 Time=Fri Sep 13 00:37:14 2019
[xla:0](140) Loss=0.00000 Rate=576.56 GlobalRate=186.37 Time=Fri Sep 13 00:37:14 2019
[xla:0](140) Loss=0.00000 Rate=576.62 GlobalRate=185.73 Time=Fri Sep 13 00:37:14 2019
[xla:0](140) Loss=0.00000 Rate=576.23 GlobalRate=193.84 Time=Fri Sep 13 00:37:14 2019
[xla:0](140) Loss=0.00000 Rate=574.94 GlobalRate=186.18 Time=Fri Sep 13 00:37:14 2019
[xla:0](140) Loss=0.00000 Rate=575.20 GlobalRate=190.69 Time=Fri Sep 13 00:37:14 2019
[xla:0](160) Loss=0.00000 Rate=576.71 GlobalRate=208.00 Time=Fri Sep 13 00:37:18 2019
[xla:0](160) Loss=0.00000 Rate=575.27 GlobalRate=207.78 Time=Fri Sep 13 00:37:18 2019
[xla:0](160) Loss=0.00000 Rate=575.91 GlobalRate=208.80 Time=Fri Sep 13 00:37:18 2019
[xla:0](160) Loss=0.00000 Rate=575.03 GlobalRate=202.77 Time=Fri Sep 13 00:37:18 2019
[xla:0](160) Loss=0.00000 Rate=574.84 GlobalRate=203.43 Time=Fri Sep 13 00:37:18 2019
[xla:0](160) Loss=0.00000 Rate=574.04 GlobalRate=203.23 Time=Fri Sep 13 00:37:18 2019
[xla:0](160) Loss=0.00000 Rate=573.99 GlobalRate=211.19 Time=Fri Sep 13 00:37:18 2019
[xla:0](160) Loss=0.00000 Rate=572.93 GlobalRate=202.43 Time=Fri Sep 13 00:37:18 2019
[xla:0](180) Loss=0.00000 Rate=586.04 GlobalRate=219.36 Time=Fri Sep 13 00:37:23 2019
[xla:0](180) Loss=0.00000 Rate=586.54 GlobalRate=218.36 Time=Fri Sep 13 00:37:23 2019
[xla:0](180) Loss=0.00000 Rate=586.49 GlobalRate=227.39 Time=Fri Sep 13 00:37:23 2019
[xla:0](180) Loss=0.00000 Rate=585.20 GlobalRate=224.05 Time=Fri Sep 13 00:37:23 2019
[xla:0](180) Loss=0.00000 Rate=585.84 GlobalRate=218.67 Time=Fri Sep 13 00:37:23 2019
[xla:0](180) Loss=0.00000 Rate=585.80 GlobalRate=219.16 Time=Fri Sep 13 00:37:23 2019
[xla:0](180) Loss=0.00000 Rate=583.81 GlobalRate=223.79 Time=Fri Sep 13 00:37:23 2019
[xla:0](180) Loss=0.00000 Rate=583.87 GlobalRate=224.84 Time=Fri Sep 13 00:37:23 2019
[xla:0](200) Loss=0.00000 Rate=587.17 GlobalRate=238.51 Time=Fri Sep 13 00:37:27 2019
[xla:0](200) Loss=0.00000 Rate=586.87 GlobalRate=233.95 Time=Fri Sep 13 00:37:27 2019
[xla:0](200) Loss=0.00000 Rate=587.38 GlobalRate=239.59 Time=Fri Sep 13 00:37:27 2019
[xla:0](200) Loss=0.00000 Rate=586.78 GlobalRate=233.24 Time=Fri Sep 13 00:37:27 2019
[xla:0](200) Loss=0.00000 Rate=586.90 GlobalRate=233.74 Time=Fri Sep 13 00:37:27 2019
[xla:0](200) Loss=0.00000 Rate=586.49 GlobalRate=238.74 Time=Fri Sep 13 00:37:27 2019
[xla:0](200) Loss=0.00000 Rate=586.99 GlobalRate=232.92 Time=Fri Sep 13 00:37:27 2019
[xla:0](200) Loss=0.00000 Rate=586.48 GlobalRate=242.15 Time=Fri Sep 13 00:37:27 2019
[xla:0](220) Loss=0.00000 Rate=580.78 GlobalRate=252.11 Time=Fri Sep 13 00:37:32 2019
[xla:0](220) Loss=0.00000 Rate=581.27 GlobalRate=255.58 Time=Fri Sep 13 00:37:32 2019
[xla:0](220) Loss=0.00000 Rate=580.69 GlobalRate=247.24 Time=Fri Sep 13 00:37:32 2019
[xla:0](220) Loss=0.00000 Rate=580.66 GlobalRate=252.96 Time=Fri Sep 13 00:37:32 2019
[xla:0](220) Loss=0.00000 Rate=580.57 GlobalRate=246.19 Time=Fri Sep 13 00:37:32 2019
[xla:0](220) Loss=0.00000 Rate=580.10 GlobalRate=247.02 Time=Fri Sep 13 00:37:32 2019
[xla:0](220) Loss=0.00000 Rate=579.45 GlobalRate=246.49 Time=Fri Sep 13 00:37:32 2019
[xla:0](220) Loss=0.00000 Rate=579.40 GlobalRate=251.84 Time=Fri Sep 13 00:37:32 2019
[xla:0](240) Loss=0.00000 Rate=574.55 GlobalRate=267.85 Time=Fri Sep 13 00:37:36 2019
[xla:0](240) Loss=0.00000 Rate=574.32 GlobalRate=259.44 Time=Fri Sep 13 00:37:36 2019
[xla:0](240) Loss=0.00000 Rate=575.17 GlobalRate=258.72 Time=Fri Sep 13 00:37:36 2019
[xla:0](240) Loss=0.00000 Rate=574.25 GlobalRate=264.35 Time=Fri Sep 13 00:37:36 2019
[xla:0](240) Loss=0.00000 Rate=574.60 GlobalRate=265.21 Time=Fri Sep 13 00:37:36 2019
[xla:0](240) Loss=0.00000 Rate=574.50 GlobalRate=258.38 Time=Fri Sep 13 00:37:36 2019
[xla:0](240) Loss=0.00000 Rate=574.22 GlobalRate=259.21 Time=Fri Sep 13 00:37:36 2019
[xla:0](240) Loss=0.00000 Rate=574.22 GlobalRate=264.08 Time=Fri Sep 13 00:37:36 2019
[xla:0](260) Loss=0.00000 Rate=574.30 GlobalRate=275.49 Time=Fri Sep 13 00:37:40 2019
[xla:0](260) Loss=0.00000 Rate=573.20 GlobalRate=279.23 Time=Fri Sep 13 00:37:40 2019
[xla:0](260) Loss=0.00000 Rate=573.66 GlobalRate=270.57 Time=Fri Sep 13 00:37:40 2019
[xla:0](260) Loss=0.00000 Rate=572.91 GlobalRate=270.77 Time=Fri Sep 13 00:37:40 2019
[xla:0](260) Loss=0.00000 Rate=573.00 GlobalRate=269.71 Time=Fri Sep 13 00:37:40 2019
[xla:0](260) Loss=0.00000 Rate=573.12 GlobalRate=270.05 Time=Fri Sep 13 00:37:40 2019
[xla:0](260) Loss=0.00000 Rate=572.03 GlobalRate=275.68 Time=Fri Sep 13 00:37:40 2019
[xla:0](260) Loss=0.00000 Rate=572.15 GlobalRate=276.55 Time=Fri Sep 13 00:37:40 2019
[xla:0](280) Loss=0.00000 Rate=574.07 GlobalRate=281.37 Time=Fri Sep 13 00:37:45 2019
[xla:0](280) Loss=0.00000 Rate=574.14 GlobalRate=281.16 Time=Fri Sep 13 00:37:45 2019
[xla:0](280) Loss=0.00000 Rate=574.23 GlobalRate=280.64 Time=Fri Sep 13 00:37:45 2019
[xla:0](280) Loss=0.00000 Rate=574.07 GlobalRate=286.08 Time=Fri Sep 13 00:37:45 2019
[xla:0](280) Loss=0.00000 Rate=574.62 GlobalRate=287.18 Time=Fri Sep 13 00:37:45 2019
[xla:0](280) Loss=0.00000 Rate=574.53 GlobalRate=286.31 Time=Fri Sep 13 00:37:45 2019
[xla:0](280) Loss=0.00000 Rate=573.76 GlobalRate=280.29 Time=Fri Sep 13 00:37:45 2019
[xla:0](280) Loss=0.00000 Rate=571.07 GlobalRate=289.74 Time=Fri Sep 13 00:37:45 2019
[xla:0](300) Loss=0.00000 Rate=577.26 GlobalRate=297.13 Time=Fri Sep 13 00:37:49 2019
[xla:0](300) Loss=0.00000 Rate=577.21 GlobalRate=296.26 Time=Fri Sep 13 00:37:49 2019
[xla:0](300) Loss=0.00000 Rate=576.97 GlobalRate=291.10 Time=Fri Sep 13 00:37:49 2019
[xla:0](300) Loss=0.00000 Rate=576.98 GlobalRate=290.59 Time=Fri Sep 13 00:37:49 2019
[xla:0](300) Loss=0.00000 Rate=576.75 GlobalRate=291.31 Time=Fri Sep 13 00:37:49 2019
[xla:0](300) Loss=0.00000 Rate=577.89 GlobalRate=299.75 Time=Fri Sep 13 00:37:49 2019
[xla:0](300) Loss=0.00000 Rate=576.45 GlobalRate=290.23 Time=Fri Sep 13 00:37:49 2019
[xla:0](300) Loss=0.00000 Rate=575.41 GlobalRate=295.98 Time=Fri Sep 13 00:37:49 2019
[xla:0](320) Loss=0.00000 Rate=566.59 GlobalRate=305.21 Time=Fri Sep 13 00:37:54 2019
[xla:0](320) Loss=0.00000 Rate=566.54 GlobalRate=299.56 Time=Fri Sep 13 00:37:54 2019
[xla:0](320) Loss=0.00000 Rate=566.96 GlobalRate=299.22 Time=Fri Sep 13 00:37:54 2019
[xla:0](320) Loss=0.00000 Rate=566.46 GlobalRate=300.28 Time=Fri Sep 13 00:37:54 2019
[xla:0](320) Loss=0.00000 Rate=567.26 GlobalRate=304.97 Time=Fri Sep 13 00:37:54 2019
[xla:0](320) Loss=0.00000 Rate=567.47 GlobalRate=308.70 Time=Fri Sep 13 00:37:54 2019
[xla:0](320) Loss=0.00000 Rate=566.20 GlobalRate=306.07 Time=Fri Sep 13 00:37:54 2019
[xla:0](320) Loss=0.00000 Rate=565.91 GlobalRate=300.05 Time=Fri Sep 13 00:37:54 2019
[xla:0](340) Loss=0.00000 Rate=576.85 GlobalRate=308.86 Time=Fri Sep 13 00:37:58 2019
[xla:0](340) Loss=0.00000 Rate=576.64 GlobalRate=314.85 Time=Fri Sep 13 00:37:58 2019
[xla:0](340) Loss=0.00000 Rate=576.44 GlobalRate=309.07 Time=Fri Sep 13 00:37:58 2019
[xla:0](340) Loss=0.00000 Rate=576.73 GlobalRate=313.75 Time=Fri Sep 13 00:37:58 2019
[xla:0](340) Loss=0.00000 Rate=576.38 GlobalRate=308.01 Time=Fri Sep 13 00:37:58 2019
[xla:0](340) Loss=0.00000 Rate=576.12 GlobalRate=313.98 Time=Fri Sep 13 00:37:58 2019
[xla:0](340) Loss=0.00000 Rate=575.85 GlobalRate=317.43 Time=Fri Sep 13 00:37:58 2019
[xla:0](340) Loss=0.00000 Rate=575.17 GlobalRate=308.32 Time=Fri Sep 13 00:37:58 2019
[xla:0](360) Loss=0.00000 Rate=599.52 GlobalRate=323.59 Time=Fri Sep 13 00:38:02 2019
[xla:0](360) Loss=0.00000 Rate=600.30 GlobalRate=317.11 Time=Fri Sep 13 00:38:02 2019
[xla:0](360) Loss=0.00000 Rate=600.23 GlobalRate=326.20 Time=Fri Sep 13 00:38:02 2019
[xla:0](360) Loss=0.00000 Rate=599.51 GlobalRate=316.77 Time=Fri Sep 13 00:38:02 2019
[xla:0](360) Loss=0.00000 Rate=599.28 GlobalRate=322.72 Time=Fri Sep 13 00:38:02 2019
[xla:0](360) Loss=0.00000 Rate=598.72 GlobalRate=317.60 Time=Fri Sep 13 00:38:02 2019
[xla:0](360) Loss=0.00000 Rate=598.61 GlobalRate=317.81 Time=Fri Sep 13 00:38:02 2019
[xla:0](360) Loss=0.00000 Rate=598.13 GlobalRate=322.46 Time=Fri Sep 13 00:38:02 2019
^CTraceback (most recent call last):
File "test_train_mp_imagenet.py", line 197, in <module>
xmp.spawn(_mp_fn, args=(FLAGS,), nprocs=FLAGS.num_cores)
File "/usr/share/torch-xla-nightly/pytorch/xla/torch_xla_py/xla_multiprocessing.py", line 129, in spawn
_start_fn, args=(fn, args), nprocs=nprocs, join=join, daemon=daemon)
File "/anaconda3/envs/pytorch-nightly/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn
while not spawn_context.join():
File "/anaconda3/envs/pytorch-nightly/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 77, in join
timeout=timeout,
File "/anaconda3/envs/pytorch-nightly/lib/python3.6/multiprocessing/connection.py", line 911, in wait
ready = selector.select(timeout)
File "/anaconda3/envs/pytorch-nightly/lib/python3.6/selectors.py", line 376, in select
fd_event_list = self._poll.poll(timeout)
KeyboardInterrupt
^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/anaconda3/envs/pytorch-nightly/lib/python3.6/multiprocessing/popen_fork.py", line 28, in poll
pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment