AmosChenYQ/graph_execution.log

## graph_execution.log
2022-06-13 13:40:38.064084: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-13 13:40:38.214452: I tensorflow/core/platform/cloud/gcs_file_system.cc:806] GCS cache max size = 0 ; block size = 67108864 ; max staleness = 0
2022-06-13 13:40:38.214523: I ./tensorflow/core/platform/cloud/ram_file_block_cache.h:64] GCS file block cache is disabled
2022-06-13 13:40:38.214537: I tensorflow/core/platform/cloud/gcs_file_system.cc:846] GCS DNS cache is disabled, because GCS_RESOLVE_REFRESH_SECS = 0 (or is not set)
2022-06-13 13:40:38.214542: I tensorflow/core/platform/cloud/gcs_file_system.cc:876] GCS additional header DISABLED. No environment variable set.
2022-06-13 13:40:38.215427: I tensorflow/core/util/util.cc:168] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-06-13 13:40:38.219971: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2022-06-13 13:40:38.256355: I tensorflow/core/platform/cloud/gcs_file_system.cc:806] GCS cache max size = 0 ; block size = 67108864 ; max staleness = 0
2022-06-13 13:40:38.256399: I ./tensorflow/core/platform/cloud/ram_file_block_cache.h:64] GCS file block cache is disabled
2022-06-13 13:40:38.256422: I tensorflow/core/platform/cloud/gcs_file_system.cc:846] GCS DNS cache is disabled, because GCS_RESOLVE_REFRESH_SECS = 0 (or is not set)
2022-06-13 13:40:38.256427: I tensorflow/core/platform/cloud/gcs_file_system.cc:876] GCS additional header DISABLED. No environment variable set.
2022-06-13 13:40:38.937792: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libnvinfer.so.7
2022-06-13 13:40:39.763279: I tensorflow/compiler/xla/parse_flags_from_env.cc:197] For env var TF_XLA_FLAGS found arguments:
2022-06-13 13:40:39.763374: I tensorflow/compiler/xla/parse_flags_from_env.cc:199]   argv[0] = <argv[0]>
2022-06-13 13:40:39.763414: I tensorflow/compiler/xla/parse_flags_from_env.cc:197] For env var TF_JITRT_FLAGS found arguments:
2022-06-13 13:40:39.763443: I tensorflow/compiler/xla/parse_flags_from_env.cc:199]   argv[0] = <argv[0]>
2022-06-13 13:40:39.763480: I tensorflow/compiler/jit/xla_cpu_device.cc:44] Not creating XLA devices, tf_xla_enable_xla_devices not set and XLA device creation not requested
2022-06-13 13:40:39.763572: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2022-06-13 13:40:39.832088: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1836] Found device 0 with properties:
pciBusID: 0000:18:00.0 name: NVIDIA GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s
2022-06-13 13:40:39.832352: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1836] Found device 1 with properties:
pciBusID: 0000:86:00.0 name: NVIDIA GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s
2022-06-13 13:40:39.832381: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2022-06-13 13:40:39.832434: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2022-06-13 13:40:39.832463: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2022-06-13 13:40:39.835850: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2022-06-13 13:40:39.836145: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2022-06-13 13:40:39.837072: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2022-06-13 13:40:39.837841: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2022-06-13 13:40:39.837885: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2022-06-13 13:40:39.838704: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1975] Adding visible gpu devices: 0, 1
2022-06-13 13:40:39.838731: I tensorflow/compiler/jit/xla_gpu_device.cc:48] Not creating XLA devices, tf_xla_enable_xla_devices not set and XLA devices creation not required
2022-06-13 13:40:39.839797: I ./tensorflow/core/common_runtime/mkl_cpu_allocator.h:178] MklCPUAllocator: Setting max_mem_bytes: 134837268480
2022-06-13 13:40:39.839826: I tensorflow/core/common_runtime/bfc_allocator.cc:70] Creating new BFCAllocator named: mklcpu
2022-06-13 13:40:39.839835: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 256B
2022-06-13 13:40:39.839842: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 512B
2022-06-13 13:40:39.839854: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 1.0KiB
2022-06-13 13:40:39.839866: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 2.0KiB
2022-06-13 13:40:39.839873: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 4.0KiB
2022-06-13 13:40:39.839881: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 8.0KiB
2022-06-13 13:40:39.839889: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 16.0KiB
2022-06-13 13:40:39.839898: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 32.0KiB
2022-06-13 13:40:39.839906: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 64.0KiB
2022-06-13 13:40:39.839915: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 128.0KiB
2022-06-13 13:40:39.839922: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 256.0KiB
2022-06-13 13:40:39.839931: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 512.0KiB
2022-06-13 13:40:39.839938: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 1.00MiB
2022-06-13 13:40:39.839946: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 2.00MiB
2022-06-13 13:40:39.839953: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 4.00MiB
2022-06-13 13:40:39.839961: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 8.00MiB
2022-06-13 13:40:39.839970: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 16.00MiB
2022-06-13 13:40:39.839977: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 32.00MiB
2022-06-13 13:40:39.839985: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 64.00MiB
2022-06-13 13:40:39.839994: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 128.00MiB
2022-06-13 13:40:39.840002: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 256.00MiB
2022-06-13 13:40:39.840060: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-13 13:40:39.843389: I tensorflow/compiler/jit/xla_cpu_device.cc:58] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-06-13 13:40:40.109274: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1836] Found device 0 with properties:
pciBusID: 0000:18:00.0 name: NVIDIA GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s
2022-06-13 13:40:40.109518: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1836] Found device 1 with properties:
pciBusID: 0000:86:00.0 name: NVIDIA GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s
2022-06-13 13:40:40.110131: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1975] Adding visible gpu devices: 0, 1
2022-06-13 13:40:40.110166: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2022-06-13 13:40:40.570942: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1333] Cuda stream priority range on GPU(0): -5,0
2022-06-13 13:40:40.942157: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1333] Cuda stream priority range on GPU(0): -5,0
2022-06-13 13:40:40.942217: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1384] TensorFlow compiled with CUDA 11.2 and cuDNN 8.1.0
2022-06-13 13:40:40.942256: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1396] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-06-13 13:40:40.942264: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402]      0 1
2022-06-13 13:40:40.942269: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1415] 0:   N N
2022-06-13 13:40:40.942273: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1415] 1:   N N
2022-06-13 13:40:40.943241: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1677] GPUDevice PlatformDeviceId 0 TfDeviceId 0 on bus 1 numa: 0 pci: 0000:18:00.0 DeviceLocality: bus_id: 1
links {
}

2022-06-13 13:40:40.943455: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1677] GPUDevice PlatformDeviceId 1 TfDeviceId 1 on bus 2 numa: 1 pci: 0000:86:00.0 DeviceLocality: bus_id: 2
numa_node: 1
links {
}

2022-06-13 13:40:40.943642: I tensorflow/core/common_runtime/bfc_allocator.cc:70] Creating new BFCAllocator named: GPU_0_bfc
2022-06-13 13:40:40.943653: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 256B
2022-06-13 13:40:40.943657: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 512B
2022-06-13 13:40:40.943666: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 1.0KiB
2022-06-13 13:40:40.943671: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 2.0KiB
2022-06-13 13:40:40.943675: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 4.0KiB
2022-06-13 13:40:40.943680: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 8.0KiB
2022-06-13 13:40:40.943685: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 16.0KiB
2022-06-13 13:40:40.943690: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 32.0KiB
2022-06-13 13:40:40.943694: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 64.0KiB
2022-06-13 13:40:40.943699: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 128.0KiB
2022-06-13 13:40:40.943703: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 256.0KiB
2022-06-13 13:40:40.943708: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 512.0KiB
2022-06-13 13:40:40.943712: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 1.00MiB
2022-06-13 13:40:40.943717: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 2.00MiB
2022-06-13 13:40:40.943721: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 4.00MiB
2022-06-13 13:40:40.943725: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 8.00MiB
2022-06-13 13:40:40.943730: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 16.00MiB
2022-06-13 13:40:40.943734: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 32.00MiB
2022-06-13 13:40:40.943739: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 64.00MiB
2022-06-13 13:40:40.943743: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 128.00MiB
2022-06-13 13:40:40.943748: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 256.00MiB
2022-06-13 13:40:40.943781: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1550] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9657 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:18:00.0, compute capability: 7.5
2022-06-13 13:40:40.943797: I tensorflow/stream_executor/stream.cc:261] [stream=0x21641e50,impl=0x212aa010] Called Stream::Stream(parent=0x34a8420)
2022-06-13 13:40:40.943803: I tensorflow/stream_executor/stream.cc:308] [stream=0x21641e50,impl=0x212aa010] Called Stream::Init()
2022-06-13 13:40:40.943855: I tensorflow/stream_executor/stream.cc:261] [stream=0x216502f0,impl=0x71a55a0] Called Stream::Stream(parent=0x34a8420)
2022-06-13 13:40:40.943863: I tensorflow/stream_executor/stream.cc:308] [stream=0x216502f0,impl=0x71a55a0] Called Stream::Init()
2022-06-13 13:40:40.943872: I tensorflow/stream_executor/stream.cc:261] [stream=0x2111ed90,impl=0x212a9910] Called Stream::Stream(parent=0x34a8420)
2022-06-13 13:40:40.943877: I tensorflow/stream_executor/stream.cc:308] [stream=0x2111ed90,impl=0x212a9910] Called Stream::Init()
2022-06-13 13:40:40.943885: I tensorflow/stream_executor/stream.cc:261] [stream=0x2111ece0,impl=0x212a9b00] Called Stream::Stream(parent=0x34a8420)
2022-06-13 13:40:40.943891: I tensorflow/stream_executor/stream.cc:308] [stream=0x2111ece0,impl=0x212a9b00] Called Stream::Init()
2022-06-13 13:40:40.943902: I tensorflow/core/common_runtime/bfc_allocator.cc:70] Creating new BFCAllocator named: gpu_host_bfc
2022-06-13 13:40:40.943907: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 256B
2022-06-13 13:40:40.943911: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 512B
2022-06-13 13:40:40.943916: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 1.0KiB
2022-06-13 13:40:40.943920: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 2.0KiB
2022-06-13 13:40:40.943924: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 4.0KiB
2022-06-13 13:40:40.943928: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 8.0KiB
2022-06-13 13:40:40.943933: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 16.0KiB
2022-06-13 13:40:40.943937: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 32.0KiB
2022-06-13 13:40:40.943942: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 64.0KiB
2022-06-13 13:40:40.943946: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 128.0KiB
2022-06-13 13:40:40.943950: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 256.0KiB
2022-06-13 13:40:40.943955: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 512.0KiB
2022-06-13 13:40:40.943959: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 1.00MiB
2022-06-13 13:40:40.943963: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 2.00MiB
2022-06-13 13:40:40.943968: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 4.00MiB
2022-06-13 13:40:40.943972: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 8.00MiB
2022-06-13 13:40:40.943976: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 16.00MiB
2022-06-13 13:40:40.943981: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 32.00MiB
2022-06-13 13:40:40.943985: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 64.00MiB
2022-06-13 13:40:40.943990: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 128.00MiB
2022-06-13 13:40:40.943994: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 256.00MiB
2022-06-13 13:40:40.944642: I tensorflow/core/common_runtime/bfc_allocator.cc:70] Creating new BFCAllocator named: GPU_1_bfc
2022-06-13 13:40:40.944656: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 256B
2022-06-13 13:40:40.944662: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 512B
2022-06-13 13:40:40.944667: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 1.0KiB
2022-06-13 13:40:40.944672: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 2.0KiB
2022-06-13 13:40:40.944676: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 4.0KiB
2022-06-13 13:40:40.944681: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 8.0KiB
2022-06-13 13:40:40.944686: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 16.0KiB
2022-06-13 13:40:40.944691: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 32.0KiB
2022-06-13 13:40:40.944696: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 64.0KiB
2022-06-13 13:40:40.944701: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 128.0KiB
2022-06-13 13:40:40.944705: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 256.0KiB
2022-06-13 13:40:40.944710: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 512.0KiB
2022-06-13 13:40:40.944715: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 1.00MiB
2022-06-13 13:40:40.944719: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 2.00MiB
2022-06-13 13:40:40.944724: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 4.00MiB
2022-06-13 13:40:40.944729: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 8.00MiB
2022-06-13 13:40:40.944734: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 16.00MiB
2022-06-13 13:40:40.944738: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 32.00MiB
2022-06-13 13:40:40.944743: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 64.00MiB
2022-06-13 13:40:40.944748: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 128.00MiB
2022-06-13 13:40:40.944753: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 256.00MiB
2022-06-13 13:40:40.944770: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1550] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 9657 MB memory:  -> device: 1, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:86:00.0, compute capability: 7.5
2022-06-13 13:40:40.944781: I tensorflow/stream_executor/stream.cc:261] [stream=0x1b49bd30,impl=0x1b49b320] Called Stream::Stream(parent=0x359dce0)
2022-06-13 13:40:40.944788: I tensorflow/stream_executor/stream.cc:308] [stream=0x1b49bd30,impl=0x1b49b320] Called Stream::Init()
2022-06-13 13:40:40.944809: I tensorflow/stream_executor/stream.cc:261] [stream=0x1b49f2b0,impl=0x1b49b540] Called Stream::Stream(parent=0x359dce0)
2022-06-13 13:40:40.944816: I tensorflow/stream_executor/stream.cc:308] [stream=0x1b49f2b0,impl=0x1b49b540] Called Stream::Init()
2022-06-13 13:40:40.944825: I tensorflow/stream_executor/stream.cc:261] [stream=0x214b73d0,impl=0x1b49b2f0] Called Stream::Stream(parent=0x359dce0)
2022-06-13 13:40:40.944831: I tensorflow/stream_executor/stream.cc:308] [stream=0x214b73d0,impl=0x1b49b2f0] Called Stream::Init()
2022-06-13 13:40:40.944840: I tensorflow/stream_executor/stream.cc:261] [stream=0x214b76c0,impl=0x212a9cb0] Called Stream::Stream(parent=0x359dce0)
2022-06-13 13:40:40.944845: I tensorflow/stream_executor/stream.cc:308] [stream=0x214b76c0,impl=0x212a9cb0] Called Stream::Init()
2022-06-13 13:40:40.945193: I tensorflow/compiler/jit/xla_gpu_device.cc:79] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-06-13 13:40:40.945250: I tensorflow/core/common_runtime/process_util.cc:159] Session inter op parallelism threads: 32
2022-06-13 13:40:40.949036: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:57] Process op _EagerConst in device
2022-06-13 13:40:40.949093: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:58] Number of return vals is 1
2022-06-13 13:40:40.949112: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:95] Execute _EagerConst in device
2022-06-13 13:40:40.971113: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _EagerConst:GPU::_EagerConst takes 21899.5us
2022-06-13 13:40:40.971168: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _EagerConst:CPU::_EagerConst takes 8.856us
2022-06-13 13:40:40.971200: I tensorflow/core/common_runtime/eager/execute.cc:923] PreferredDevice _EagerConst: /job:localhost/replica:0/task:0
2022-06-13 13:40:40.971212: I tensorflow/core/common_runtime/eager/execute.cc:924] Placer place op [_EagerConst] on device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:40.971243: I tensorflow/core/common_runtime/eager/execute.cc:982] _EagerConst:input:0 /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:40.971266: I tensorflow/core/common_runtime/eager/execute.cc:1062] Device for [_EagerConst] already set to: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:40.972475: I tensorflow/core/common_runtime/eager/execute.cc:823] signature {
  name: "__wrapped____EagerConst_device_/job:localhost/replica:0/task:0/device:GPU:0"
  input_arg {
    name: "input"
    type_attr: "T"
  }
  output_arg {
    name: "output"
    type_attr: "T"
  }
  attr {
    name: "T"
    type: "type"
  }
}
node_def {
  name: "_EagerConst"
  op: "_EagerConst"
  input: "input:0"
  device: "/job:localhost/replica:0/task:0/device:GPU:0"
  attr {
    key: "T"
    value {
      placeholder: "T"
    }
  }
}
ret {
  key: "output"
  value: "_EagerConst:output:0"
}

2022-06-13 13:40:40.981225: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__wrapped____EagerConst_device_/job:localhost/replica:0/task:0/device:GPU:0' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:40.981303: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__wrapped____EagerConst_device_/job:localhost/replica:0/task:0/device:GPU:0' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:40.981336: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__wrapped____EagerConst_device_/job:localhost/replica:0/task:0/device:GPU:0' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:40.981380: I tensorflow/core/common_runtime/process_function_library_runtime.cc:772] Instantiating MultiDevice function "__wrapped____EagerConst_device_/job:localhost/replica:0/task:0/device:GPU:0" on default device "/job:localhost/replica:0/task:0/device:GPU:0"
2022-06-13 13:40:40.982891: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:191] None of the MLIR Optimization Passes are enabled (registered 3)
2022-06-13 13:40:40.982917: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 0
2022-06-13 13:40:40.982930: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 0
2022-06-13 13:40:40.982936: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: MlirV1CompatGraphOptimizationPass
2022-06-13 13:40:40.982947: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 9
2022-06-13 13:40:40.982952: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ControlFlowDepsToChainsPass
2022-06-13 13:40:40.982959: I tensorflow/core/common_runtime/control_flow_deps_to_chains.cc:37] ControlFlowDepsToChainsPass::Run
2022-06-13 13:40:40.982975: W tensorflow/core/util/dump_graph.cc:134] Failed to dump control_flow_deps_to_chains_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:40.982995: W tensorflow/core/util/dump_graph.cc:134] Failed to dump control_flow_deps_to_chains_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:40.983009: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 10
2022-06-13 13:40:40.983016: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: AccumulateNV2RemovePass
2022-06-13 13:40:40.983022: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: LowerFunctionalOpsPass
2022-06-13 13:40:40.983034: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ParallelConcatRemovePass
2022-06-13 13:40:40.983041: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 35
2022-06-13 13:40:40.983046: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: IsolatePlacerInspectionRequiredOpsPass
2022-06-13 13:40:40.983052: I tensorflow/core/common_runtime/isolate_placer_inspection_required_ops_pass.cc:34] IsolatePlacerInspectionRequiredOpsPass::Run
2022-06-13 13:40:40.983060: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: IntroduceFloatingPointJitterPass
2022-06-13 13:40:40.983076: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 36
2022-06-13 13:40:40.983081: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: EncapsulateXlaComputationsPass
2022-06-13 13:40:40.983092: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_xla_computations_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:40.983101: I tensorflow/compiler/jit/encapsulate_xla_computations_pass.cc:353] EncapsulateXlaComputations(): (failed to create writable file: INVALID_ARGUMENT: TF_DUMP_GRAPH_PREFIX not specified)
2022-06-13 13:40:40.983145: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_xla_computations_halfway because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:40.983156: I tensorflow/compiler/jit/encapsulate_xla_computations_pass.cc:364] EncapsulateXlaComputations() half-way: (failed to create writable file: INVALID_ARGUMENT: TF_DUMP_GRAPH_PREFIX not specified)
2022-06-13 13:40:40.983166: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_xla_computations_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:40.983174: I tensorflow/compiler/jit/encapsulate_xla_computations_pass.cc:370] EncapsulateXlaComputations() finished: (failed to create writable file: INVALID_ARGUMENT: TF_DUMP_GRAPH_PREFIX not specified)
2022-06-13 13:40:40.983186: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 37
2022-06-13 13:40:40.983191: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: FunctionalizeControlFlowForXlaPass
2022-06-13 13:40:40.983227: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 99999
2022-06-13 13:40:40.983234: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: WeakForwardTypeInferencePass
2022-06-13 13:40:40.983241: I tensorflow/core/common_runtime/forward_type_inference.cc:130] ForwardTypeInferencePass::Run
2022-06-13 13:40:40.983251: W tensorflow/core/util/dump_graph.cc:134] Failed to dump forward_type_inference_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:40.983275: I tensorflow/core/common_runtime/forward_type_inference.cc:311] Finished after 1 iterations; done 5 of 5 nodes in 5 visits
2022-06-13 13:40:40.983287: W tensorflow/core/util/dump_graph.cc:134] Failed to dump forward_type_inference_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:40.983298: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 0
2022-06-13 13:40:40.983327: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node input}}'Will fall back to a default kernel.

2022-06-13 13:40:40.983348: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::input takes 17.743us
2022-06-13 13:40:40.983359: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:CPU::input takes 1.013us
2022-06-13 13:40:40.983375: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _EagerConst:GPU::_EagerConst takes 3.999us
2022-06-13 13:40:40.983382: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _EagerConst:CPU::_EagerConst takes 0.623us
2022-06-13 13:40:40.983405: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node output_RetVal}}'Will fall back to a default kernel.

2022-06-13 13:40:40.983414: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:GPU::output_RetVal takes 23.784us
2022-06-13 13:40:40.983421: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:CPU::output_RetVal takes 0.658us
2022-06-13 13:40:40.983437: I tensorflow/core/common_runtime/placer.cc:124] input(_Arg) placed on: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:40.983446: I tensorflow/core/common_runtime/placer.cc:124] _EagerConst(_EagerConst) placed on: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:40.983452: I tensorflow/core/common_runtime/placer.cc:124] output_RetVal(_Retval) placed on: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:40.983458: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 1
2022-06-13 13:40:40.983464: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 0
2022-06-13 13:40:40.983469: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: NcclReplacePass
2022-06-13 13:40:40.983481: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 1
2022-06-13 13:40:40.983487: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 2
2022-06-13 13:40:40.983492: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 5
2022-06-13 13:40:40.983497: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: CloneConstantsForBetterClusteringPass
2022-06-13 13:40:40.983505: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 9
2022-06-13 13:40:40.983515: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ClusterScopingPass
2022-06-13 13:40:40.983521: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 10
2022-06-13 13:40:40.983526: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: MarkForCompilationPass
2022-06-13 13:40:40.987619: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: XlaLaunch:CPU::_XlaLaunch-op takes 1.715us
2022-06-13 13:40:40.987640: I tensorflow/compiler/tf2xla/xla_op_registry.cc:51] LaunchOpHasKernelForDevice kernel_class_name: XlaLocalLaunchOp
2022-06-13 13:40:40.987650: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: XlaLaunch:GPU::_XlaLaunch-op takes 0.553us
2022-06-13 13:40:40.987655: I tensorflow/compiler/tf2xla/xla_op_registry.cc:51] LaunchOpHasKernelForDevice kernel_class_name: XlaLocalLaunchOp
2022-06-13 13:40:40.987684: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _EagerConst:XLA_GPU_JIT::_EagerConst takes 1.192us
2022-06-13 13:40:40.987769: I tensorflow/compiler/jit/mark_for_compilation_pass.cc:650] DeadnessAnalysis time: 12 us (cumulative: 12 us, max: 12 us, #called: 1)
2022-06-13 13:40:40.987819: I tensorflow/compiler/jit/mark_for_compilation_pass.cc:1523] MarkForCompilationPassImpl::Run time: 442 us (cumulative: 442 us, max: 442 us, #called: 1)
2022-06-13 13:40:40.987833: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 12
2022-06-13 13:40:40.987838: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ForceXlaConstantsOnHostPass
2022-06-13 13:40:40.987850: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 20
2022-06-13 13:40:40.987854: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: IncreaseDynamismForAutoJitPass
2022-06-13 13:40:40.987861: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 30
2022-06-13 13:40:40.987866: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: PartiallyDeclusterPass
2022-06-13 13:40:40.987886: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 40
2022-06-13 13:40:40.987891: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ReportClusteringInfoPass
2022-06-13 13:40:40.987973: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 50
2022-06-13 13:40:40.987980: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: EncapsulateSubgraphsPass
2022-06-13 13:40:40.987986: I tensorflow/compiler/jit/encapsulate_subgraphs_pass.cc:1139] EncapsulateSubgraphsPass::Run
2022-06-13 13:40:40.988003: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_subgraphs_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:40.988081: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_subgraphs_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:40.988105: I tensorflow/compiler/jit/xla_cluster_util.cc:590] GetNodesRelatedToRefVariables() found 0 nodes
2022-06-13 13:40:40.988126: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 60
2022-06-13 13:40:40.988131: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: BuildXlaOpsPass
2022-06-13 13:40:40.988139: I tensorflow/compiler/jit/build_xla_ops_pass.cc:603] print_outputs = 0
2022-06-13 13:40:40.988143: I tensorflow/compiler/jit/build_xla_ops_pass.cc:604] check_input_numerics = 0
2022-06-13 13:40:40.988147: I tensorflow/compiler/jit/build_xla_ops_pass.cc:605] check_output_numerics = 0
2022-06-13 13:40:40.988157: W tensorflow/core/util/dump_graph.cc:134] Failed to dump build_xla_ops because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:40.988169: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 2
2022-06-13 13:40:40.988193: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:CPU::input takes 1.889us
2022-06-13 13:40:40.988211: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _EagerConst:GPU::_EagerConst takes 4.199us
2022-06-13 13:40:40.988225: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node output_RetVal}}'Will fall back to a default kernel.

2022-06-13 13:40:40.988234: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:GPU::output_RetVal takes 12.735us
2022-06-13 13:40:40.988283: I tensorflow/core/graph/graph_partition.cc:281] Receiving data from input (_Arg) on /job:localhost/replica:0/task:0/device:CPU:0 in device memory for _EagerConst (_EagerConst) on /job:localhost/replica:0/task:0/device:GPU:0 in host memory
2022-06-13 13:40:40.988308: I tensorflow/core/graph/graph_partition.cc:1251] Added send/recv: controls=0, data=1
2022-06-13 13:40:40.988374: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 3
2022-06-13 13:40:40.988386: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 1
2022-06-13 13:40:40.988392: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: MklLayoutRewritePass
2022-06-13 13:40:40.991060: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _HostRecv, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:40.991079: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _EagerConst, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:40.991084: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _Retval, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:40.991089: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _HostRecv, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:40.991094: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _EagerConst, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:40.991098: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _Retval, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:40.991103: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _HostRecv, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:40.991108: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _EagerConst, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:40.991112: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _Retval, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:40.991121: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 3
2022-06-13 13:40:40.991140: W tensorflow/core/util/dump_graph.cc:134] Failed to dump pflr_after_all_optimization_passes_118900896_/job:localhost/replica:0/task:0/device:CPU:0 because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:40.991158: W tensorflow/core/util/dump_graph.cc:134] Failed to dump pflr_after_all_optimization_passes_565427744_/job:localhost/replica:0/task:0/device:GPU:0 because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:40.991237: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__wrapped____EagerConst_device_/job:localhost/replica:0/task:0/device:GPU:0_18117741797234826063_0' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:40.991259: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1105] Start instantiating component function __wrapped____EagerConst_device_/job:localhost/replica:0/task:0/device:GPU:0_18117741797234826063_0 on device /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:40.991361: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1114] Finished instantiating component function __wrapped____EagerConst_device_/job:localhost/replica:0/task:0/device:GPU:0_18117741797234826063_0 with handle 0 status: OK
2022-06-13 13:40:40.991400: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__wrapped____EagerConst_device_/job:localhost/replica:0/task:0/device:GPU:0_18117741797234826063_1' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:40.991416: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1105] Start instantiating component function __wrapped____EagerConst_device_/job:localhost/replica:0/task:0/device:GPU:0_18117741797234826063_1 on device /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:40.991482: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1114] Finished instantiating component function __wrapped____EagerConst_device_/job:localhost/replica:0/task:0/device:GPU:0_18117741797234826063_1 with handle 1 status: OK
2022-06-13 13:40:40.991543: I tensorflow/core/common_runtime/eager/execute.cc:1353] Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:40.991577: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1302] Running component function on device /job:localhost/replica:0/task:0/device:CPU:0 from __wrapped____EagerConst_device_/job:localhost/replica:0/task:0/device:GPU:0 with handle 0
2022-06-13 13:40:40.991618: I tensorflow/core/common_runtime/constant_folding.cc:613] No constant foldable nodes found
2022-06-13 13:40:40.991682: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node input/_1}} = _Send[T=DT_INT32, _dst="_EagerConst", _src="input", client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=-7643437611148729878, tensor_name="edge_2_input", _device="/job:localhost/replica:0/task:0/device:CPU:0"](input)
2022-06-13 13:40:40.991703: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Send:CPU::input/_1 takes 1.332us
2022-06-13 13:40:40.991711: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Send:CPU::input/_1 takes 0.372us
2022-06-13 13:40:40.991746: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node input/_1}} = _Send[T=DT_INT32, _dst="_EagerConst", _src="input", client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=-7643437611148729878, tensor_name="edge_2_input", _device="/job:localhost/replica:0/task:0/device:CPU:0"](input) takes 69.497us

2022-06-13 13:40:40.991779: I tensorflow/core/common_runtime/rendezvous_mgr.cc:167] IntraProcessRendezvous Send 0x214df9f0 /job:localhost/replica:0/task:0/device:CPU:0;95ed0dfd449ea1ea;/job:localhost/replica:0/task:0/device:GPU:0;edge_2_input;0:0
2022-06-13 13:40:40.991792: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1302] Running component function on device /job:localhost/replica:0/task:0/device:GPU:0 from __wrapped____EagerConst_device_/job:localhost/replica:0/task:0/device:GPU:0 with handle 1
2022-06-13 13:40:40.991814: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _EagerConst:CPU::_EagerConst takes 0.866us
2022-06-13 13:40:40.991823: I tensorflow/core/common_runtime/constant_folding.cc:613] No constant foldable nodes found
2022-06-13 13:40:40.991849: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _SOURCE}}'Will fall back to a default kernel.

2022-06-13 13:40:40.991858: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: NoOp:GPU::_SOURCE takes 10.091us
2022-06-13 13:40:40.991865: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _SINK}}'Will fall back to a default kernel.

2022-06-13 13:40:40.991870: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: NoOp:GPU::_SINK takes 5.329us
2022-06-13 13:40:40.991877: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node input/_2}}'Will fall back to a default kernel.

2022-06-13 13:40:40.991882: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _HostRecv:GPU::input/_2 takes 5.778us
2022-06-13 13:40:40.991896: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _EagerConst:GPU::_EagerConst takes 3.694us
2022-06-13 13:40:40.991906: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node output_retval_RetVal}}'Will fall back to a default kernel.

2022-06-13 13:40:40.991911: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:GPU::output_retval_RetVal takes 7.897us
2022-06-13 13:40:40.991919: I tensorflow/core/common_runtime/memory_types.cc:87] 2:0 -> 3:0: 1 -> 1
2022-06-13 13:40:40.991924: I tensorflow/core/common_runtime/memory_types.cc:87] 3:0 -> 4:0: 1 -> 1
2022-06-13 13:40:40.991930: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _SOURCE}}'Will fall back to a default kernel.

2022-06-13 13:40:40.991935: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: NoOp:GPU::_SOURCE takes 5.643us
2022-06-13 13:40:40.991942: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _SINK}}'Will fall back to a default kernel.

2022-06-13 13:40:40.991947: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: NoOp:GPU::_SINK takes 5.234us
2022-06-13 13:40:40.991953: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node input/_2}}'Will fall back to a default kernel.

2022-06-13 13:40:40.991958: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _HostRecv:GPU::input/_2 takes 5.652us
2022-06-13 13:40:40.991966: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _EagerConst:GPU::_EagerConst takes 1.355us
2022-06-13 13:40:40.991975: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node output_retval_RetVal}}'Will fall back to a default kernel.

2022-06-13 13:40:40.991980: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:GPU::output_retval_RetVal takes 6.289us
2022-06-13 13:40:40.991986: I tensorflow/core/common_runtime/memory_types.cc:87] 2:0 -> 3:0: 1 -> 1
2022-06-13 13:40:40.991991: I tensorflow/core/common_runtime/memory_types.cc:87] 3:0 -> 4:0: 1 -> 1
2022-06-13 13:40:40.992006: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node input/_2}} = _HostRecv[_dst="_EagerConst", _src="input", client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=-7643437611148729878, tensor_name="edge_2_input", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()
2022-06-13 13:40:40.992015: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node input/_2}}'Will fall back to a default kernel.

2022-06-13 13:40:40.992021: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _HostRecv:GPU::input/_2 takes 5.797us
2022-06-13 13:40:40.992026: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node input/_2}}'Will fall back to a default kernel.

2022-06-13 13:40:40.992031: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _HostRecv:GPU::input/_2 takes 5.33us
2022-06-13 13:40:40.992050: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node input/_2}} = _HostRecv[_dst="_EagerConst", _src="input", client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=-7643437611148729878, tensor_name="edge_2_input", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"]() takes 45.084us

2022-06-13 13:40:40.992061: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node _EagerConst}} = _EagerConst[T=DT_INT32, _XlaHasReferenceVars=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](input/_2)
2022-06-13 13:40:40.992069: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _EagerConst:GPU::_EagerConst takes 1.424us
2022-06-13 13:40:40.992076: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _EagerConst:GPU::_EagerConst takes 1.224us
2022-06-13 13:40:40.992091: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node _EagerConst}} = _EagerConst[T=DT_INT32, _XlaHasReferenceVars=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](input/_2) takes 29.632us

2022-06-13 13:40:40.992099: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node output_retval_RetVal}} = _Retval[T=DT_INT32, index=0](_EagerConst)
2022-06-13 13:40:40.992107: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node output_retval_RetVal}}'Will fall back to a default kernel.

2022-06-13 13:40:40.992113: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:GPU::output_retval_RetVal takes 6.668us
2022-06-13 13:40:40.992119: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node output_retval_RetVal}}'Will fall back to a default kernel.

2022-06-13 13:40:40.992124: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:GPU::output_retval_RetVal takes 6.333us
2022-06-13 13:40:40.992134: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node output_retval_RetVal}} = _Retval[T=DT_INT32, index=0](_EagerConst) takes 33.958us

2022-06-13 13:40:41.002415: I tensorflow/stream_executor/stream_executor_pimpl.cc:534] Called StreamExecutor::Allocate(size=10126688256, memory_space=0) returns 0x7f990c000000
2022-06-13 13:40:41.002440: I tensorflow/core/common_runtime/bfc_allocator.cc:157] Extending allocation by 9.43GiB bytes for GPU_0_bfc.
2022-06-13 13:40:41.002446: I tensorflow/core/common_runtime/bfc_allocator.cc:162] Total allocated bytes: 9.43GiB
2022-06-13 13:40:41.002451: I tensorflow/core/common_runtime/bfc_allocator.cc:165] Allocated memory at 0x7f990c000000 to 0x7f9b67990000
2022-06-13 13:40:41.146670: I tensorflow/stream_executor/stream_executor_pimpl.cc:623] Called StreamExecutor::SynchronousMemZero(location=0x7ffd501c04c0, size=1028)
2022-06-13 13:40:41.147132: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper input/_2 op _HostRecv on GPU 0 stream[0]
2022-06-13 13:40:41.147148: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] input/_2:_HostRecv#from=input,to=_EagerConst#
2022-06-13 13:40:41.147164: I tensorflow/core/common_runtime/rendezvous_mgr.cc:174] IntraProcessRendezvous Recv 0x214df9f0 /job:localhost/replica:0/task:0/device:CPU:0;95ed0dfd449ea1ea;/job:localhost/replica:0/task:0/device:GPU:0;edge_2_input;0:0
2022-06-13 13:40:41.147170: I tensorflow/core/common_runtime/rendezvous_mgr.cc:125] IntraProcessRendezvous Recv 0x214dfa10 /job:localhost/replica:0/task:0/device:CPU:0;95ed0dfd449ea1ea;/job:localhost/replica:0/task:0/device:GPU:0;edge_2_input;0:0
2022-06-13 13:40:41.147188: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled input/_2 op _HostRecv on GPU 0 stream[0]
2022-06-13 13:40:41.147197: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper _EagerConst op _EagerConst on GPU 0 stream[0]
2022-06-13 13:40:41.147209: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] _EagerConst:_EagerConst#shape=(int32[2])#
2022-06-13 13:40:41.147223: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled _EagerConst op _EagerConst on GPU 0 stream[0]
2022-06-13 13:40:41.147229: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper output_retval_RetVal op _Retval on GPU 0 stream[0]
2022-06-13 13:40:41.147234: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] output_retval_RetVal:_Retval#shape=(int32[2])#
2022-06-13 13:40:41.147240: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled output_retval_RetVal op _Retval on GPU 0 stream[0]
2022-06-13 13:40:41.147462: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:57] Process op RandomUniform in device
2022-06-13 13:40:41.147475: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:58] Number of return vals is 1
2022-06-13 13:40:41.147483: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:95] Execute RandomUniform in device
2022-06-13 13:40:41.147538: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: RandomUniform:GPU::RandomUniform takes 7.387us
2022-06-13 13:40:41.147550: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: RandomUniform:CPU::RandomUniform takes 2.496us
2022-06-13 13:40:41.147564: I tensorflow/core/common_runtime/eager/execute.cc:923] PreferredDevice RandomUniform: /job:localhost/replica:0/task:0
2022-06-13 13:40:41.147569: I tensorflow/core/common_runtime/eager/execute.cc:924] Placer place op [RandomUniform] on device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.147581: I tensorflow/core/common_runtime/eager/execute.cc:982] RandomUniform:input:0 /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.147597: I tensorflow/core/common_runtime/eager/execute.cc:1062] Device for [RandomUniform] already set to: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.147763: I tensorflow/core/common_runtime/eager/execute.cc:823] signature {
  name: "__wrapped__RandomUniform_device_/job:localhost/replica:0/task:0/device:GPU:0"
  input_arg {
    name: "shape"
    type_attr: "T"
  }
  output_arg {
    name: "output"
    type_attr: "dtype"
  }
  attr {
    name: "seed"
    type: "int"
    default_value {
      i: 0
    }
  }
  attr {
    name: "seed2"
    type: "int"
    default_value {
      i: 0
    }
  }
  attr {
    name: "dtype"
    type: "type"
    allowed_values {
      list {
        type: DT_HALF
        type: DT_BFLOAT16
        type: DT_FLOAT
        type: DT_DOUBLE
      }
    }
  }
  attr {
    name: "T"
    type: "type"
    allowed_values {
      list {
        type: DT_INT32
        type: DT_INT64
      }
    }
  }
  is_stateful: true
}
node_def {
  name: "RandomUniform"
  op: "RandomUniform"
  input: "shape:0"
  device: "/job:localhost/replica:0/task:0/device:GPU:0"
  attr {
    key: "T"
    value {
      placeholder: "T"
    }
  }
  attr {
    key: "dtype"
    value {
      placeholder: "dtype"
    }
  }
  attr {
    key: "seed"
    value {
      placeholder: "seed"
    }
  }
  attr {
    key: "seed2"
    value {
      placeholder: "seed2"
    }
  }
}
ret {
  key: "output"
  value: "RandomUniform:output:0"
}

2022-06-13 13:40:41.147791: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__wrapped__RandomUniform_device_/job:localhost/replica:0/task:0/device:GPU:0' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.147832: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__wrapped__RandomUniform_device_/job:localhost/replica:0/task:0/device:GPU:0' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.147851: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__wrapped__RandomUniform_device_/job:localhost/replica:0/task:0/device:GPU:0' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.147890: I tensorflow/core/common_runtime/process_function_library_runtime.cc:772] Instantiating MultiDevice function "__wrapped__RandomUniform_device_/job:localhost/replica:0/task:0/device:GPU:0" on default device "/job:localhost/replica:0/task:0/device:GPU:0"
2022-06-13 13:40:41.148064: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 0
2022-06-13 13:40:41.148076: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 0
2022-06-13 13:40:41.148080: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: MlirV1CompatGraphOptimizationPass
2022-06-13 13:40:41.148086: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 9
2022-06-13 13:40:41.148091: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ControlFlowDepsToChainsPass
2022-06-13 13:40:41.148096: I tensorflow/core/common_runtime/control_flow_deps_to_chains.cc:37] ControlFlowDepsToChainsPass::Run
2022-06-13 13:40:41.148109: W tensorflow/core/util/dump_graph.cc:134] Failed to dump control_flow_deps_to_chains_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.148124: W tensorflow/core/util/dump_graph.cc:134] Failed to dump control_flow_deps_to_chains_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.148132: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 10
2022-06-13 13:40:41.148137: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: AccumulateNV2RemovePass
2022-06-13 13:40:41.148143: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: LowerFunctionalOpsPass
2022-06-13 13:40:41.148154: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ParallelConcatRemovePass
2022-06-13 13:40:41.148161: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 35
2022-06-13 13:40:41.148166: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: IsolatePlacerInspectionRequiredOpsPass
2022-06-13 13:40:41.148171: I tensorflow/core/common_runtime/isolate_placer_inspection_required_ops_pass.cc:34] IsolatePlacerInspectionRequiredOpsPass::Run
2022-06-13 13:40:41.148179: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: IntroduceFloatingPointJitterPass
2022-06-13 13:40:41.148185: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 36
2022-06-13 13:40:41.148190: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: EncapsulateXlaComputationsPass
2022-06-13 13:40:41.148199: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_xla_computations_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.148207: I tensorflow/compiler/jit/encapsulate_xla_computations_pass.cc:353] EncapsulateXlaComputations(): (failed to create writable file: INVALID_ARGUMENT: TF_DUMP_GRAPH_PREFIX not specified)
2022-06-13 13:40:41.148246: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_xla_computations_halfway because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.148266: I tensorflow/compiler/jit/encapsulate_xla_computations_pass.cc:364] EncapsulateXlaComputations() half-way: (failed to create writable file: INVALID_ARGUMENT: TF_DUMP_GRAPH_PREFIX not specified)
2022-06-13 13:40:41.148276: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_xla_computations_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.148283: I tensorflow/compiler/jit/encapsulate_xla_computations_pass.cc:370] EncapsulateXlaComputations() finished: (failed to create writable file: INVALID_ARGUMENT: TF_DUMP_GRAPH_PREFIX not specified)
2022-06-13 13:40:41.148289: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 37
2022-06-13 13:40:41.148293: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: FunctionalizeControlFlowForXlaPass
2022-06-13 13:40:41.148312: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 99999
2022-06-13 13:40:41.148317: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: WeakForwardTypeInferencePass
2022-06-13 13:40:41.148323: I tensorflow/core/common_runtime/forward_type_inference.cc:130] ForwardTypeInferencePass::Run
2022-06-13 13:40:41.148331: W tensorflow/core/util/dump_graph.cc:134] Failed to dump forward_type_inference_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.148349: I tensorflow/core/common_runtime/forward_type_inference.cc:311] Finished after 1 iterations; done 5 of 5 nodes in 5 visits
2022-06-13 13:40:41.148358: W tensorflow/core/util/dump_graph.cc:134] Failed to dump forward_type_inference_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.148367: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 0
2022-06-13 13:40:41.148393: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node shape}}'Will fall back to a default kernel.

2022-06-13 13:40:41.148405: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::shape takes 17.953us
2022-06-13 13:40:41.148412: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:CPU::shape takes 1.052us
2022-06-13 13:40:41.148424: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: RandomUniform:GPU::RandomUniform takes 1.937us
2022-06-13 13:40:41.148431: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: RandomUniform:CPU::RandomUniform takes 1.878us
2022-06-13 13:40:41.148444: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node output_RetVal}}'Will fall back to a default kernel.

2022-06-13 13:40:41.148450: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:GPU::output_RetVal takes 11.325us
2022-06-13 13:40:41.148456: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:CPU::output_RetVal takes 0.608us
2022-06-13 13:40:41.148470: I tensorflow/core/common_runtime/placer.cc:124] shape(_Arg) placed on: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.148478: I tensorflow/core/common_runtime/placer.cc:124] RandomUniform(RandomUniform) placed on: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.148484: I tensorflow/core/common_runtime/placer.cc:124] output_RetVal(_Retval) placed on: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.148489: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 1
2022-06-13 13:40:41.148495: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 0
2022-06-13 13:40:41.148500: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: NcclReplacePass
2022-06-13 13:40:41.148506: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 1
2022-06-13 13:40:41.148511: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 2
2022-06-13 13:40:41.148515: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 5
2022-06-13 13:40:41.148519: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: CloneConstantsForBetterClusteringPass
2022-06-13 13:40:41.148526: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 9
2022-06-13 13:40:41.148531: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ClusterScopingPass
2022-06-13 13:40:41.148537: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 10
2022-06-13 13:40:41.148542: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: MarkForCompilationPass
2022-06-13 13:40:41.148795: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: RandomUniform:XLA_GPU_JIT::RandomUniform takes 1.786us
2022-06-13 13:40:41.148837: I tensorflow/compiler/jit/mark_for_compilation_pass.cc:1523] MarkForCompilationPassImpl::Run time: 279 us (cumulative: 721 us, max: 442 us, #called: 2)
2022-06-13 13:40:41.148847: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 12
2022-06-13 13:40:41.148852: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ForceXlaConstantsOnHostPass
2022-06-13 13:40:41.148862: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 20
2022-06-13 13:40:41.148867: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: IncreaseDynamismForAutoJitPass
2022-06-13 13:40:41.148873: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 30
2022-06-13 13:40:41.148878: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: PartiallyDeclusterPass
2022-06-13 13:40:41.148898: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 40
2022-06-13 13:40:41.148906: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ReportClusteringInfoPass
2022-06-13 13:40:41.149069: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 50
2022-06-13 13:40:41.149078: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: EncapsulateSubgraphsPass
2022-06-13 13:40:41.149084: I tensorflow/compiler/jit/encapsulate_subgraphs_pass.cc:1139] EncapsulateSubgraphsPass::Run
2022-06-13 13:40:41.149098: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_subgraphs_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.149167: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_subgraphs_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.149191: I tensorflow/compiler/jit/xla_cluster_util.cc:590] GetNodesRelatedToRefVariables() found 0 nodes
2022-06-13 13:40:41.149213: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 60
2022-06-13 13:40:41.149220: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: BuildXlaOpsPass
2022-06-13 13:40:41.149227: I tensorflow/compiler/jit/build_xla_ops_pass.cc:603] print_outputs = 0
2022-06-13 13:40:41.149232: I tensorflow/compiler/jit/build_xla_ops_pass.cc:604] check_input_numerics = 0
2022-06-13 13:40:41.149236: I tensorflow/compiler/jit/build_xla_ops_pass.cc:605] check_output_numerics = 0
2022-06-13 13:40:41.149246: W tensorflow/core/util/dump_graph.cc:134] Failed to dump build_xla_ops because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.149257: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 2
2022-06-13 13:40:41.149278: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:CPU::shape takes 1.331us
2022-06-13 13:40:41.149293: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: RandomUniform:GPU::RandomUniform takes 1.931us
2022-06-13 13:40:41.149307: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node output_RetVal}}'Will fall back to a default kernel.

2022-06-13 13:40:41.149316: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:GPU::output_RetVal takes 14.39us
2022-06-13 13:40:41.149356: I tensorflow/core/graph/graph_partition.cc:281] Receiving data from shape (_Arg) on /job:localhost/replica:0/task:0/device:CPU:0 in device memory for RandomUniform (RandomUniform) on /job:localhost/replica:0/task:0/device:GPU:0 in host memory
2022-06-13 13:40:41.149384: I tensorflow/core/graph/graph_partition.cc:1251] Added send/recv: controls=0, data=1
2022-06-13 13:40:41.149447: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 3
2022-06-13 13:40:41.149456: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 1
2022-06-13 13:40:41.149461: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: MklLayoutRewritePass
2022-06-13 13:40:41.149485: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _HostRecv, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.149493: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node RandomUniform, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.149498: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _Retval, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.149503: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _HostRecv, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.149508: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node RandomUniform, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.149512: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _Retval, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.149517: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _HostRecv, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.149521: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node RandomUniform, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.149526: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _Retval, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.149532: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 3
2022-06-13 13:40:41.149544: W tensorflow/core/util/dump_graph.cc:134] Failed to dump pflr_after_all_optimization_passes_565427744_/job:localhost/replica:0/task:0/device:CPU:0 because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.149560: W tensorflow/core/util/dump_graph.cc:134] Failed to dump pflr_after_all_optimization_passes_564047568_/job:localhost/replica:0/task:0/device:GPU:0 because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.149617: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__wrapped__RandomUniform_device_/job:localhost/replica:0/task:0/device:GPU:0_10249392314444985097_0' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.149638: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1105] Start instantiating component function __wrapped__RandomUniform_device_/job:localhost/replica:0/task:0/device:GPU:0_10249392314444985097_0 on device /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.149715: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1114] Finished instantiating component function __wrapped__RandomUniform_device_/job:localhost/replica:0/task:0/device:GPU:0_10249392314444985097_0 with handle 3 status: OK
2022-06-13 13:40:41.149754: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__wrapped__RandomUniform_device_/job:localhost/replica:0/task:0/device:GPU:0_10249392314444985097_1' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.149769: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1105] Start instantiating component function __wrapped__RandomUniform_device_/job:localhost/replica:0/task:0/device:GPU:0_10249392314444985097_1 on device /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.149838: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1114] Finished instantiating component function __wrapped__RandomUniform_device_/job:localhost/replica:0/task:0/device:GPU:0_10249392314444985097_1 with handle 4 status: OK
2022-06-13 13:40:41.149887: I tensorflow/core/common_runtime/eager/execute.cc:1353] Executing op RandomUniform in device /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.149910: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1302] Running component function on device /job:localhost/replica:0/task:0/device:CPU:0 from __wrapped__RandomUniform_device_/job:localhost/replica:0/task:0/device:GPU:0 with handle 3
2022-06-13 13:40:41.149945: I tensorflow/core/common_runtime/constant_folding.cc:613] No constant foldable nodes found
2022-06-13 13:40:41.149996: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node shape/_1}} = _Send[T=DT_INT32, _dst="RandomUniform", _src="shape", client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=-7643437611148729878, tensor_name="edge_2_shape", _device="/job:localhost/replica:0/task:0/device:CPU:0"](shape)
2022-06-13 13:40:41.150015: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Send:CPU::shape/_1 takes 2.469us
2022-06-13 13:40:41.150022: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Send:CPU::shape/_1 takes 0.327us
2022-06-13 13:40:41.150047: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node shape/_1}} = _Send[T=DT_INT32, _dst="RandomUniform", _src="shape", client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=-7643437611148729878, tensor_name="edge_2_shape", _device="/job:localhost/replica:0/task:0/device:CPU:0"](shape) takes 54.851us

2022-06-13 13:40:41.150064: I tensorflow/core/common_runtime/rendezvous_mgr.cc:167] IntraProcessRendezvous Send 0x214df9f0 /job:localhost/replica:0/task:0/device:CPU:0;95ed0dfd449ea1ea;/job:localhost/replica:0/task:0/device:GPU:0;edge_2_shape;0:0
2022-06-13 13:40:41.150073: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1302] Running component function on device /job:localhost/replica:0/task:0/device:GPU:0 from __wrapped__RandomUniform_device_/job:localhost/replica:0/task:0/device:GPU:0 with handle 4
2022-06-13 13:40:41.150091: I tensorflow/core/common_runtime/constant_folding.cc:613] No constant foldable nodes found
2022-06-13 13:40:41.150116: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _SOURCE}}'Will fall back to a default kernel.

2022-06-13 13:40:41.150125: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: NoOp:GPU::_SOURCE takes 10.856us
2022-06-13 13:40:41.150133: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _SINK}}'Will fall back to a default kernel.

2022-06-13 13:40:41.150138: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: NoOp:GPU::_SINK takes 5.369us
2022-06-13 13:40:41.150145: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node shape/_2}}'Will fall back to a default kernel.

2022-06-13 13:40:41.150150: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _HostRecv:GPU::shape/_2 takes 6.163us
2022-06-13 13:40:41.150160: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: RandomUniform:GPU::RandomUniform takes 1.331us
2022-06-13 13:40:41.150171: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node output_retval_RetVal}}'Will fall back to a default kernel.

2022-06-13 13:40:41.150176: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:GPU::output_retval_RetVal takes 8.151us
2022-06-13 13:40:41.150183: I tensorflow/core/common_runtime/memory_types.cc:87] 2:0 -> 3:0: 1 -> 1
2022-06-13 13:40:41.150189: I tensorflow/core/common_runtime/memory_types.cc:87] 3:0 -> 4:0: 0 -> 0
2022-06-13 13:40:41.150195: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _SOURCE}}'Will fall back to a default kernel.

2022-06-13 13:40:41.150200: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: NoOp:GPU::_SOURCE takes 5.211us
2022-06-13 13:40:41.150206: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _SINK}}'Will fall back to a default kernel.

2022-06-13 13:40:41.150211: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: NoOp:GPU::_SINK takes 5.284us
2022-06-13 13:40:41.150217: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node shape/_2}}'Will fall back to a default kernel.

2022-06-13 13:40:41.150222: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _HostRecv:GPU::shape/_2 takes 5.17us
2022-06-13 13:40:41.150230: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: RandomUniform:GPU::RandomUniform takes 0.76us
2022-06-13 13:40:41.150239: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node output_retval_RetVal}}'Will fall back to a default kernel.

2022-06-13 13:40:41.150244: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:GPU::output_retval_RetVal takes 6.632us
2022-06-13 13:40:41.150250: I tensorflow/core/common_runtime/memory_types.cc:87] 2:0 -> 3:0: 1 -> 1
2022-06-13 13:40:41.150255: I tensorflow/core/common_runtime/memory_types.cc:87] 3:0 -> 4:0: 0 -> 0
2022-06-13 13:40:41.150269: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node shape/_2}} = _HostRecv[_dst="RandomUniform", _src="shape", client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=-7643437611148729878, tensor_name="edge_2_shape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()
2022-06-13 13:40:41.150278: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node shape/_2}}'Will fall back to a default kernel.

2022-06-13 13:40:41.150284: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _HostRecv:GPU::shape/_2 takes 5.573us
2022-06-13 13:40:41.150289: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node shape/_2}}'Will fall back to a default kernel.

2022-06-13 13:40:41.150294: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _HostRecv:GPU::shape/_2 takes 5.068us
2022-06-13 13:40:41.150312: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node shape/_2}} = _HostRecv[_dst="RandomUniform", _src="shape", client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=-7643437611148729878, tensor_name="edge_2_shape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"]() takes 44.064us

2022-06-13 13:40:41.150321: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node RandomUniform}} = RandomUniform[T=DT_INT32, _XlaHasReferenceVars=false, dtype=DT_FLOAT, seed=0, seed2=0, _device="/job:localhost/replica:0/task:0/device:GPU:0"](shape/_2)
2022-06-13 13:40:41.150330: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: RandomUniform:GPU::RandomUniform takes 0.865us
2022-06-13 13:40:41.150337: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: RandomUniform:GPU::RandomUniform takes 0.656us
2022-06-13 13:40:41.150355: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node RandomUniform}} = RandomUniform[T=DT_INT32, _XlaHasReferenceVars=false, dtype=DT_FLOAT, seed=0, seed2=0, _device="/job:localhost/replica:0/task:0/device:GPU:0"](shape/_2) takes 33.465us

2022-06-13 13:40:41.150363: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node output_retval_RetVal}} = _Retval[T=DT_FLOAT, index=0](RandomUniform)
2022-06-13 13:40:41.150371: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node output_retval_RetVal}}'Will fall back to a default kernel.

2022-06-13 13:40:41.150377: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:GPU::output_retval_RetVal takes 6.667us
2022-06-13 13:40:41.150383: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node output_retval_RetVal}}'Will fall back to a default kernel.

2022-06-13 13:40:41.150388: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:GPU::output_retval_RetVal takes 5.841us
2022-06-13 13:40:41.150398: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node output_retval_RetVal}} = _Retval[T=DT_FLOAT, index=0](RandomUniform) takes 34.377us

2022-06-13 13:40:41.150412: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper shape/_2 op _HostRecv on GPU 0 stream[0]
2022-06-13 13:40:41.150417: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] shape/_2:_HostRecv#from=shape,to=RandomUniform#
2022-06-13 13:40:41.150429: I tensorflow/core/common_runtime/rendezvous_mgr.cc:174] IntraProcessRendezvous Recv 0x214df9f0 /job:localhost/replica:0/task:0/device:CPU:0;95ed0dfd449ea1ea;/job:localhost/replica:0/task:0/device:GPU:0;edge_2_shape;0:0
2022-06-13 13:40:41.150434: I tensorflow/core/common_runtime/rendezvous_mgr.cc:125] IntraProcessRendezvous Recv 0x214dfa10 /job:localhost/replica:0/task:0/device:CPU:0;95ed0dfd449ea1ea;/job:localhost/replica:0/task:0/device:GPU:0;edge_2_shape;0:0
2022-06-13 13:40:41.150443: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled shape/_2 op _HostRecv on GPU 0 stream[0]
2022-06-13 13:40:41.150449: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper RandomUniform op RandomUniform on GPU 0 stream[0]
2022-06-13 13:40:41.150456: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] RandomUniform:RandomUniform#shape=(int32[2])#
2022-06-13 13:40:41.150518: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled RandomUniform op RandomUniform on GPU 0 stream[0]
2022-06-13 13:40:41.150528: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper output_retval_RetVal op _Retval on GPU 0 stream[0]
2022-06-13 13:40:41.150534: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] output_retval_RetVal:_Retval#shape=(float[1024,128])#
2022-06-13 13:40:41.150540: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled output_retval_RetVal op _Retval on GPU 0 stream[0]
2022-06-13 13:40:41.150799: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:57] Process op _EagerConst in device
2022-06-13 13:40:41.150812: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:58] Number of return vals is 1
2022-06-13 13:40:41.150817: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:95] Execute _EagerConst in device
2022-06-13 13:40:41.150829: I tensorflow/core/common_runtime/eager/execute.cc:982] _EagerConst:input:0 /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.150842: I tensorflow/core/common_runtime/eager/execute.cc:1353] Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.150855: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1302] Running component function on device /job:localhost/replica:0/task:0/device:CPU:0 from __wrapped____EagerConst_device_/job:localhost/replica:0/task:0/device:GPU:0 with handle 0
2022-06-13 13:40:41.150866: I tensorflow/core/common_runtime/rendezvous_mgr.cc:167] IntraProcessRendezvous Send 0x214df9f0 /job:localhost/replica:0/task:0/device:CPU:0;95ed0dfd449ea1ea;/job:localhost/replica:0/task:0/device:GPU:0;edge_2_input;0:0
2022-06-13 13:40:41.150874: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1302] Running component function on device /job:localhost/replica:0/task:0/device:GPU:0 from __wrapped____EagerConst_device_/job:localhost/replica:0/task:0/device:GPU:0 with handle 1
2022-06-13 13:40:41.150883: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper input/_2 op _HostRecv on GPU 0 stream[0]
2022-06-13 13:40:41.150889: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] input/_2:_HostRecv#from=input,to=_EagerConst#
2022-06-13 13:40:41.150895: I tensorflow/core/common_runtime/rendezvous_mgr.cc:174] IntraProcessRendezvous Recv 0x214df9f0 /job:localhost/replica:0/task:0/device:CPU:0;95ed0dfd449ea1ea;/job:localhost/replica:0/task:0/device:GPU:0;edge_2_input;0:0
2022-06-13 13:40:41.150900: I tensorflow/core/common_runtime/rendezvous_mgr.cc:125] IntraProcessRendezvous Recv 0x214dfa10 /job:localhost/replica:0/task:0/device:CPU:0;95ed0dfd449ea1ea;/job:localhost/replica:0/task:0/device:GPU:0;edge_2_input;0:0
2022-06-13 13:40:41.150908: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled input/_2 op _HostRecv on GPU 0 stream[0]
2022-06-13 13:40:41.150914: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper _EagerConst op _EagerConst on GPU 0 stream[0]
2022-06-13 13:40:41.150920: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] _EagerConst:_EagerConst#shape=(int32[2])#
2022-06-13 13:40:41.150926: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled _EagerConst op _EagerConst on GPU 0 stream[0]
2022-06-13 13:40:41.150932: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper output_retval_RetVal op _Retval on GPU 0 stream[0]
2022-06-13 13:40:41.150936: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] output_retval_RetVal:_Retval#shape=(int32[2])#
2022-06-13 13:40:41.150941: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled output_retval_RetVal op _Retval on GPU 0 stream[0]
2022-06-13 13:40:41.150994: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:57] Process op RandomUniform in device
2022-06-13 13:40:41.151003: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:58] Number of return vals is 1
2022-06-13 13:40:41.151008: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:95] Execute RandomUniform in device
2022-06-13 13:40:41.151015: I tensorflow/core/common_runtime/eager/execute.cc:982] RandomUniform:input:0 /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.151025: I tensorflow/core/common_runtime/eager/execute.cc:1353] Executing op RandomUniform in device /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.151035: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1302] Running component function on device /job:localhost/replica:0/task:0/device:CPU:0 from __wrapped__RandomUniform_device_/job:localhost/replica:0/task:0/device:GPU:0 with handle 3
2022-06-13 13:40:41.151042: I tensorflow/core/common_runtime/rendezvous_mgr.cc:167] IntraProcessRendezvous Send 0x214df9f0 /job:localhost/replica:0/task:0/device:CPU:0;95ed0dfd449ea1ea;/job:localhost/replica:0/task:0/device:GPU:0;edge_2_shape;0:0
2022-06-13 13:40:41.151048: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1302] Running component function on device /job:localhost/replica:0/task:0/device:GPU:0 from __wrapped__RandomUniform_device_/job:localhost/replica:0/task:0/device:GPU:0 with handle 4
2022-06-13 13:40:41.151055: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper shape/_2 op _HostRecv on GPU 0 stream[0]
2022-06-13 13:40:41.151060: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] shape/_2:_HostRecv#from=shape,to=RandomUniform#
2022-06-13 13:40:41.151066: I tensorflow/core/common_runtime/rendezvous_mgr.cc:174] IntraProcessRendezvous Recv 0x214df9f0 /job:localhost/replica:0/task:0/device:CPU:0;95ed0dfd449ea1ea;/job:localhost/replica:0/task:0/device:GPU:0;edge_2_shape;0:0
2022-06-13 13:40:41.151071: I tensorflow/core/common_runtime/rendezvous_mgr.cc:125] IntraProcessRendezvous Recv 0x214dfa10 /job:localhost/replica:0/task:0/device:CPU:0;95ed0dfd449ea1ea;/job:localhost/replica:0/task:0/device:GPU:0;edge_2_shape;0:0
2022-06-13 13:40:41.151077: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled shape/_2 op _HostRecv on GPU 0 stream[0]
2022-06-13 13:40:41.151083: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper RandomUniform op RandomUniform on GPU 0 stream[0]
2022-06-13 13:40:41.151088: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] RandomUniform:RandomUniform#shape=(int32[2])#
2022-06-13 13:40:41.151111: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled RandomUniform op RandomUniform on GPU 0 stream[0]
2022-06-13 13:40:41.151120: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper output_retval_RetVal op _Retval on GPU 0 stream[0]
2022-06-13 13:40:41.151126: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] output_retval_RetVal:_Retval#shape=(float[1024,128])#
2022-06-13 13:40:41.151131: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled output_retval_RetVal op _Retval on GPU 0 stream[0]
2022-06-13 13:40:41.151221: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:57] Process op _EagerConst in device
2022-06-13 13:40:41.151231: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:58] Number of return vals is 1
2022-06-13 13:40:41.151236: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:95] Execute _EagerConst in device
2022-06-13 13:40:41.151244: I tensorflow/core/common_runtime/eager/execute.cc:982] _EagerConst:input:0 /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.151255: I tensorflow/core/common_runtime/eager/execute.cc:1353] Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.151265: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1302] Running component function on device /job:localhost/replica:0/task:0/device:CPU:0 from __wrapped____EagerConst_device_/job:localhost/replica:0/task:0/device:GPU:0 with handle 0
2022-06-13 13:40:41.151273: I tensorflow/core/common_runtime/rendezvous_mgr.cc:167] IntraProcessRendezvous Send 0x214df9f0 /job:localhost/replica:0/task:0/device:CPU:0;95ed0dfd449ea1ea;/job:localhost/replica:0/task:0/device:GPU:0;edge_2_input;0:0
2022-06-13 13:40:41.151280: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1302] Running component function on device /job:localhost/replica:0/task:0/device:GPU:0 from __wrapped____EagerConst_device_/job:localhost/replica:0/task:0/device:GPU:0 with handle 1
2022-06-13 13:40:41.151287: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper input/_2 op _HostRecv on GPU 0 stream[0]
2022-06-13 13:40:41.151292: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] input/_2:_HostRecv#from=input,to=_EagerConst#
2022-06-13 13:40:41.151298: I tensorflow/core/common_runtime/rendezvous_mgr.cc:174] IntraProcessRendezvous Recv 0x214df9f0 /job:localhost/replica:0/task:0/device:CPU:0;95ed0dfd449ea1ea;/job:localhost/replica:0/task:0/device:GPU:0;edge_2_input;0:0
2022-06-13 13:40:41.151303: I tensorflow/core/common_runtime/rendezvous_mgr.cc:125] IntraProcessRendezvous Recv 0x214dfa10 /job:localhost/replica:0/task:0/device:CPU:0;95ed0dfd449ea1ea;/job:localhost/replica:0/task:0/device:GPU:0;edge_2_input;0:0
2022-06-13 13:40:41.151311: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled input/_2 op _HostRecv on GPU 0 stream[0]
2022-06-13 13:40:41.151316: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper _EagerConst op _EagerConst on GPU 0 stream[0]
2022-06-13 13:40:41.151322: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] _EagerConst:_EagerConst#shape=(int32[3])#
2022-06-13 13:40:41.151327: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled _EagerConst op _EagerConst on GPU 0 stream[0]
2022-06-13 13:40:41.151332: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper output_retval_RetVal op _Retval on GPU 0 stream[0]
2022-06-13 13:40:41.151337: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] output_retval_RetVal:_Retval#shape=(int32[3])#
2022-06-13 13:40:41.151342: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled output_retval_RetVal op _Retval on GPU 0 stream[0]
2022-06-13 13:40:41.151377: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:57] Process op RandomUniform in device
2022-06-13 13:40:41.151385: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:58] Number of return vals is 1
2022-06-13 13:40:41.151390: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:95] Execute RandomUniform in device
2022-06-13 13:40:41.151397: I tensorflow/core/common_runtime/eager/execute.cc:982] RandomUniform:input:0 /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.151407: I tensorflow/core/common_runtime/eager/execute.cc:1353] Executing op RandomUniform in device /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.151417: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1302] Running component function on device /job:localhost/replica:0/task:0/device:CPU:0 from __wrapped__RandomUniform_device_/job:localhost/replica:0/task:0/device:GPU:0 with handle 3
2022-06-13 13:40:41.151424: I tensorflow/core/common_runtime/rendezvous_mgr.cc:167] IntraProcessRendezvous Send 0x214df9f0 /job:localhost/replica:0/task:0/device:CPU:0;95ed0dfd449ea1ea;/job:localhost/replica:0/task:0/device:GPU:0;edge_2_shape;0:0
2022-06-13 13:40:41.151430: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1302] Running component function on device /job:localhost/replica:0/task:0/device:GPU:0 from __wrapped__RandomUniform_device_/job:localhost/replica:0/task:0/device:GPU:0 with handle 4
2022-06-13 13:40:41.151437: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper shape/_2 op _HostRecv on GPU 0 stream[0]
2022-06-13 13:40:41.151442: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] shape/_2:_HostRecv#from=shape,to=RandomUniform#
2022-06-13 13:40:41.151447: I tensorflow/core/common_runtime/rendezvous_mgr.cc:174] IntraProcessRendezvous Recv 0x214df9f0 /job:localhost/replica:0/task:0/device:CPU:0;95ed0dfd449ea1ea;/job:localhost/replica:0/task:0/device:GPU:0;edge_2_shape;0:0
2022-06-13 13:40:41.151452: I tensorflow/core/common_runtime/rendezvous_mgr.cc:125] IntraProcessRendezvous Recv 0x214dfa10 /job:localhost/replica:0/task:0/device:CPU:0;95ed0dfd449ea1ea;/job:localhost/replica:0/task:0/device:GPU:0;edge_2_shape;0:0
2022-06-13 13:40:41.151458: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled shape/_2 op _HostRecv on GPU 0 stream[0]
2022-06-13 13:40:41.151464: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper RandomUniform op RandomUniform on GPU 0 stream[0]
2022-06-13 13:40:41.151469: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] RandomUniform:RandomUniform#shape=(int32[3])#
2022-06-13 13:40:41.151488: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled RandomUniform op RandomUniform on GPU 0 stream[0]
2022-06-13 13:40:41.151495: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper output_retval_RetVal op _Retval on GPU 0 stream[0]
2022-06-13 13:40:41.151500: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] output_retval_RetVal:_Retval#shape=(float[4,128,128])#
2022-06-13 13:40:41.151505: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled output_retval_RetVal op _Retval on GPU 0 stream[0]
2022-06-13 13:40:41.151577: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:57] Process op _EagerConst in device
2022-06-13 13:40:41.151584: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:58] Number of return vals is 1
2022-06-13 13:40:41.151589: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:95] Execute _EagerConst in device
2022-06-13 13:40:41.151596: I tensorflow/core/common_runtime/eager/execute.cc:982] _EagerConst:input:0 /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.151606: I tensorflow/core/common_runtime/eager/execute.cc:1353] Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.151616: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1302] Running component function on device /job:localhost/replica:0/task:0/device:CPU:0 from __wrapped____EagerConst_device_/job:localhost/replica:0/task:0/device:GPU:0 with handle 0
2022-06-13 13:40:41.151624: I tensorflow/core/common_runtime/rendezvous_mgr.cc:167] IntraProcessRendezvous Send 0x214df9f0 /job:localhost/replica:0/task:0/device:CPU:0;95ed0dfd449ea1ea;/job:localhost/replica:0/task:0/device:GPU:0;edge_2_input;0:0
2022-06-13 13:40:41.151630: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1302] Running component function on device /job:localhost/replica:0/task:0/device:GPU:0 from __wrapped____EagerConst_device_/job:localhost/replica:0/task:0/device:GPU:0 with handle 1
2022-06-13 13:40:41.151637: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper input/_2 op _HostRecv on GPU 0 stream[0]
2022-06-13 13:40:41.151642: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] input/_2:_HostRecv#from=input,to=_EagerConst#
2022-06-13 13:40:41.151648: I tensorflow/core/common_runtime/rendezvous_mgr.cc:174] IntraProcessRendezvous Recv 0x214df9f0 /job:localhost/replica:0/task:0/device:CPU:0;95ed0dfd449ea1ea;/job:localhost/replica:0/task:0/device:GPU:0;edge_2_input;0:0
2022-06-13 13:40:41.151653: I tensorflow/core/common_runtime/rendezvous_mgr.cc:125] IntraProcessRendezvous Recv 0x214dfa10 /job:localhost/replica:0/task:0/device:CPU:0;95ed0dfd449ea1ea;/job:localhost/replica:0/task:0/device:GPU:0;edge_2_input;0:0
2022-06-13 13:40:41.151660: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled input/_2 op _HostRecv on GPU 0 stream[0]
2022-06-13 13:40:41.151665: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper _EagerConst op _EagerConst on GPU 0 stream[0]
2022-06-13 13:40:41.151670: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] _EagerConst:_EagerConst#shape=(int32[3])#
2022-06-13 13:40:41.151675: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled _EagerConst op _EagerConst on GPU 0 stream[0]
2022-06-13 13:40:41.151681: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper output_retval_RetVal op _Retval on GPU 0 stream[0]
2022-06-13 13:40:41.151685: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] output_retval_RetVal:_Retval#shape=(int32[3])#
2022-06-13 13:40:41.151690: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled output_retval_RetVal op _Retval on GPU 0 stream[0]
2022-06-13 13:40:41.151723: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:57] Process op RandomUniform in device
2022-06-13 13:40:41.151729: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:58] Number of return vals is 1
2022-06-13 13:40:41.151734: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:95] Execute RandomUniform in device
2022-06-13 13:40:41.151740: I tensorflow/core/common_runtime/eager/execute.cc:982] RandomUniform:input:0 /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.151750: I tensorflow/core/common_runtime/eager/execute.cc:1353] Executing op RandomUniform in device /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.151758: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1302] Running component function on device /job:localhost/replica:0/task:0/device:CPU:0 from __wrapped__RandomUniform_device_/job:localhost/replica:0/task:0/device:GPU:0 with handle 3
2022-06-13 13:40:41.151765: I tensorflow/core/common_runtime/rendezvous_mgr.cc:167] IntraProcessRendezvous Send 0x214df9f0 /job:localhost/replica:0/task:0/device:CPU:0;95ed0dfd449ea1ea;/job:localhost/replica:0/task:0/device:GPU:0;edge_2_shape;0:0
2022-06-13 13:40:41.151771: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1302] Running component function on device /job:localhost/replica:0/task:0/device:GPU:0 from __wrapped__RandomUniform_device_/job:localhost/replica:0/task:0/device:GPU:0 with handle 4
2022-06-13 13:40:41.151777: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper shape/_2 op _HostRecv on GPU 0 stream[0]
2022-06-13 13:40:41.151782: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] shape/_2:_HostRecv#from=shape,to=RandomUniform#
2022-06-13 13:40:41.151788: I tensorflow/core/common_runtime/rendezvous_mgr.cc:174] IntraProcessRendezvous Recv 0x214df9f0 /job:localhost/replica:0/task:0/device:CPU:0;95ed0dfd449ea1ea;/job:localhost/replica:0/task:0/device:GPU:0;edge_2_shape;0:0
2022-06-13 13:40:41.151792: I tensorflow/core/common_runtime/rendezvous_mgr.cc:125] IntraProcessRendezvous Recv 0x214dfa10 /job:localhost/replica:0/task:0/device:CPU:0;95ed0dfd449ea1ea;/job:localhost/replica:0/task:0/device:GPU:0;edge_2_shape;0:0
2022-06-13 13:40:41.151798: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled shape/_2 op _HostRecv on GPU 0 stream[0]
2022-06-13 13:40:41.151804: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper RandomUniform op RandomUniform on GPU 0 stream[0]
2022-06-13 13:40:41.151809: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] RandomUniform:RandomUniform#shape=(int32[3])#
2022-06-13 13:40:41.151829: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled RandomUniform op RandomUniform on GPU 0 stream[0]
2022-06-13 13:40:41.151835: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper output_retval_RetVal op _Retval on GPU 0 stream[0]
2022-06-13 13:40:41.151840: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] output_retval_RetVal:_Retval#shape=(float[16,128,128])#
2022-06-13 13:40:41.151845: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled output_retval_RetVal op _Retval on GPU 0 stream[0]
2022-06-13 13:40:41.153338: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:57] Process op StringFormat in device
2022-06-13 13:40:41.153362: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:58] Number of return vals is 1
2022-06-13 13:40:41.153367: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:95] Execute StringFormat in device
2022-06-13 13:40:41.153396: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:GPU::StringFormat takes 2.685us
2022-06-13 13:40:41.153406: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:CPU::StringFormat takes 1.161us
2022-06-13 13:40:41.153415: I tensorflow/core/common_runtime/eager/execute.cc:923] PreferredDevice StringFormat: /job:localhost/replica:0/task:0
2022-06-13 13:40:41.153419: I tensorflow/core/common_runtime/eager/execute.cc:924] Placer place op [StringFormat] on device: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.153429: I tensorflow/core/common_runtime/eager/execute.cc:1062] Device for [StringFormat] already set to: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.153539: I tensorflow/core/common_runtime/eager/execute.cc:823] signature {
  name: "__wrapped__StringFormat_T_0_device_/job:localhost/replica:0/task:0/device:CPU:0"
  output_arg {
    name: "output"
    type: DT_STRING
  }
  attr {
    name: "T"
    type: "list(type)"
    has_minimum: true
  }
  attr {
    name: "template"
    type: "string"
    default_value {
      s: "%s"
    }
  }
  attr {
    name: "placeholder"
    type: "string"
    default_value {
      s: "%s"
    }
  }
  attr {
    name: "summarize"
    type: "int"
    default_value {
      i: 3
    }
  }
}
node_def {
  name: "StringFormat"
  op: "StringFormat"
  device: "/job:localhost/replica:0/task:0/device:CPU:0"
  attr {
    key: "T"
    value {
      placeholder: "T"
    }
  }
  attr {
    key: "placeholder"
    value {
      placeholder: "placeholder"
    }
  }
  attr {
    key: "summarize"
    value {
      placeholder: "summarize"
    }
  }
  attr {
    key: "template"
    value {
      placeholder: "template"
    }
  }
}
ret {
  key: "output"
  value: "StringFormat:output:0"
}

2022-06-13 13:40:41.153561: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__wrapped__StringFormat_T_0_device_/job:localhost/replica:0/task:0/device:CPU:0' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.153586: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__wrapped__StringFormat_T_0_device_/job:localhost/replica:0/task:0/device:CPU:0' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.153604: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__wrapped__StringFormat_T_0_device_/job:localhost/replica:0/task:0/device:CPU:0' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.153630: I tensorflow/core/common_runtime/process_function_library_runtime.cc:772] Instantiating MultiDevice function "__wrapped__StringFormat_T_0_device_/job:localhost/replica:0/task:0/device:CPU:0" on default device "/job:localhost/replica:0/task:0/device:CPU:0"
2022-06-13 13:40:41.153719: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 0
2022-06-13 13:40:41.153728: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 0
2022-06-13 13:40:41.153732: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: MlirV1CompatGraphOptimizationPass
2022-06-13 13:40:41.153737: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 9
2022-06-13 13:40:41.153742: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ControlFlowDepsToChainsPass
2022-06-13 13:40:41.153746: I tensorflow/core/common_runtime/control_flow_deps_to_chains.cc:37] ControlFlowDepsToChainsPass::Run
2022-06-13 13:40:41.153756: W tensorflow/core/util/dump_graph.cc:134] Failed to dump control_flow_deps_to_chains_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.153769: W tensorflow/core/util/dump_graph.cc:134] Failed to dump control_flow_deps_to_chains_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.153778: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 10
2022-06-13 13:40:41.153783: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: AccumulateNV2RemovePass
2022-06-13 13:40:41.153788: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: LowerFunctionalOpsPass
2022-06-13 13:40:41.153797: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ParallelConcatRemovePass
2022-06-13 13:40:41.153802: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 35
2022-06-13 13:40:41.153806: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: IsolatePlacerInspectionRequiredOpsPass
2022-06-13 13:40:41.153811: I tensorflow/core/common_runtime/isolate_placer_inspection_required_ops_pass.cc:34] IsolatePlacerInspectionRequiredOpsPass::Run
2022-06-13 13:40:41.153817: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: IntroduceFloatingPointJitterPass
2022-06-13 13:40:41.153823: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 36
2022-06-13 13:40:41.153827: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: EncapsulateXlaComputationsPass
2022-06-13 13:40:41.153836: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_xla_computations_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.153843: I tensorflow/compiler/jit/encapsulate_xla_computations_pass.cc:353] EncapsulateXlaComputations(): (failed to create writable file: INVALID_ARGUMENT: TF_DUMP_GRAPH_PREFIX not specified)
2022-06-13 13:40:41.153871: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_xla_computations_halfway because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.153882: I tensorflow/compiler/jit/encapsulate_xla_computations_pass.cc:364] EncapsulateXlaComputations() half-way: (failed to create writable file: INVALID_ARGUMENT: TF_DUMP_GRAPH_PREFIX not specified)
2022-06-13 13:40:41.153891: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_xla_computations_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.153898: I tensorflow/compiler/jit/encapsulate_xla_computations_pass.cc:370] EncapsulateXlaComputations() finished: (failed to create writable file: INVALID_ARGUMENT: TF_DUMP_GRAPH_PREFIX not specified)
2022-06-13 13:40:41.153904: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 37
2022-06-13 13:40:41.153908: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: FunctionalizeControlFlowForXlaPass
2022-06-13 13:40:41.153920: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 99999
2022-06-13 13:40:41.153925: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: WeakForwardTypeInferencePass
2022-06-13 13:40:41.153930: I tensorflow/core/common_runtime/forward_type_inference.cc:130] ForwardTypeInferencePass::Run
2022-06-13 13:40:41.153938: W tensorflow/core/util/dump_graph.cc:134] Failed to dump forward_type_inference_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.153952: I tensorflow/core/common_runtime/forward_type_inference.cc:311] Finished after 1 iterations; done 4 of 4 nodes in 4 visits
2022-06-13 13:40:41.153961: W tensorflow/core/util/dump_graph.cc:134] Failed to dump forward_type_inference_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.153970: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 0
2022-06-13 13:40:41.153986: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:GPU::StringFormat takes 1.612us
2022-06-13 13:40:41.153998: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:CPU::StringFormat takes 0.514us
2022-06-13 13:40:41.154008: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node output_RetVal}}'Will fall back to a default kernel.

2022-06-13 13:40:41.154014: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:GPU::output_RetVal takes 8.229us
2022-06-13 13:40:41.154020: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:CPU::output_RetVal takes 0.458us
2022-06-13 13:40:41.154032: I tensorflow/core/common_runtime/placer.cc:124] output_RetVal(_Retval) placed on: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.154039: I tensorflow/core/common_runtime/placer.cc:124] StringFormat(StringFormat) placed on: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.154045: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 1
2022-06-13 13:40:41.154050: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 0
2022-06-13 13:40:41.154054: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: NcclReplacePass
2022-06-13 13:40:41.154061: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 1
2022-06-13 13:40:41.154066: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 2
2022-06-13 13:40:41.154071: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 5
2022-06-13 13:40:41.154075: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: CloneConstantsForBetterClusteringPass
2022-06-13 13:40:41.154080: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 9
2022-06-13 13:40:41.154085: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ClusterScopingPass
2022-06-13 13:40:41.154090: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 10
2022-06-13 13:40:41.154094: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: MarkForCompilationPass
2022-06-13 13:40:41.154301: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:XLA_CPU_JIT::StringFormat takes 1.309us
2022-06-13 13:40:41.154348: I tensorflow/compiler/jit/mark_for_compilation_pass.cc:1523] MarkForCompilationPassImpl::Run time: 243 us (cumulative: 964 us, max: 442 us, #called: 3)
2022-06-13 13:40:41.154358: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 12
2022-06-13 13:40:41.154362: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ForceXlaConstantsOnHostPass
2022-06-13 13:40:41.154371: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 20
2022-06-13 13:40:41.154376: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: IncreaseDynamismForAutoJitPass
2022-06-13 13:40:41.154381: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 30
2022-06-13 13:40:41.154386: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: PartiallyDeclusterPass
2022-06-13 13:40:41.154402: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 40
2022-06-13 13:40:41.154407: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ReportClusteringInfoPass
2022-06-13 13:40:41.154424: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 50
2022-06-13 13:40:41.154429: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: EncapsulateSubgraphsPass
2022-06-13 13:40:41.154434: I tensorflow/compiler/jit/encapsulate_subgraphs_pass.cc:1139] EncapsulateSubgraphsPass::Run
2022-06-13 13:40:41.154445: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_subgraphs_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.154493: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_subgraphs_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.154512: I tensorflow/compiler/jit/xla_cluster_util.cc:590] GetNodesRelatedToRefVariables() found 0 nodes
2022-06-13 13:40:41.154531: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 60
2022-06-13 13:40:41.154540: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: BuildXlaOpsPass
2022-06-13 13:40:41.154546: I tensorflow/compiler/jit/build_xla_ops_pass.cc:603] print_outputs = 0
2022-06-13 13:40:41.154550: I tensorflow/compiler/jit/build_xla_ops_pass.cc:604] check_input_numerics = 0
2022-06-13 13:40:41.154554: I tensorflow/compiler/jit/build_xla_ops_pass.cc:605] check_output_numerics = 0
2022-06-13 13:40:41.154563: W tensorflow/core/util/dump_graph.cc:134] Failed to dump build_xla_ops because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.154574: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 2
2022-06-13 13:40:41.154587: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:CPU::StringFormat takes 0.997us
2022-06-13 13:40:41.154597: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:CPU::output_RetVal takes 0.468us
2022-06-13 13:40:41.154614: I tensorflow/core/graph/graph_partition.cc:1251] Added send/recv: controls=0, data=0
2022-06-13 13:40:41.154653: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 3
2022-06-13 13:40:41.154661: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 1
2022-06-13 13:40:41.154665: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: MklLayoutRewritePass
2022-06-13 13:40:41.156972: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 3
2022-06-13 13:40:41.157000: W tensorflow/core/util/dump_graph.cc:134] Failed to dump pflr_after_all_optimization_passes_564176736_/job:localhost/replica:0/task:0/device:CPU:0 because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.157047: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__wrapped__StringFormat_T_0_device_/job:localhost/replica:0/task:0/device:CPU:0_16642374198413653398_0' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.157064: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1105] Start instantiating component function __wrapped__StringFormat_T_0_device_/job:localhost/replica:0/task:0/device:CPU:0_16642374198413653398_0 on device /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.157131: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1114] Finished instantiating component function __wrapped__StringFormat_T_0_device_/job:localhost/replica:0/task:0/device:CPU:0_16642374198413653398_0 with handle 6 status: OK
2022-06-13 13:40:41.157167: I tensorflow/core/common_runtime/eager/execute.cc:1353] Executing op StringFormat in device /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.157181: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1302] Running component function on device /job:localhost/replica:0/task:0/device:CPU:0 from __wrapped__StringFormat_T_0_device_/job:localhost/replica:0/task:0/device:CPU:0 with handle 6
2022-06-13 13:40:41.157212: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:CPU::StringFormat takes 1.265us
2022-06-13 13:40:41.157231: I tensorflow/core/common_runtime/constant_folding.cc:631] Constant foldable 3 : 4
2022-06-13 13:40:41.157324: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node _SOURCE}} = NoOp[]()
2022-06-13 13:40:41.157335: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _SOURCE}}'Will fall back to a default kernel.

2022-06-13 13:40:41.157342: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: NoOp:CPU::_SOURCE takes 7.162us
2022-06-13 13:40:41.157348: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _SOURCE}}'Will fall back to a default kernel.

2022-06-13 13:40:41.157353: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: NoOp:CPU::_SOURCE takes 5.475us
2022-06-13 13:40:41.157364: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node _SOURCE}} = NoOp[]() takes 39.915us

2022-06-13 13:40:41.157376: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node StringFormat}} = StringFormat[T=[], _XlaHasReferenceVars=false, placeholder="{}", summarize=3, template="run 0 costs 0", _device="/job:localhost/replica:0/task:0/device:CPU:0"]()
2022-06-13 13:40:41.157386: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:CPU::StringFormat takes 0.415us
2022-06-13 13:40:41.157393: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:CPU::StringFormat takes 0.362us
2022-06-13 13:40:41.157417: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node StringFormat}} = StringFormat[T=[], _XlaHasReferenceVars=false, placeholder="{}", summarize=3, template="run 0 costs 0", _device="/job:localhost/replica:0/task:0/device:CPU:0"]() takes 41.146us

2022-06-13 13:40:41.157430: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node _send_StringFormat_0}} = _Send[T=DT_STRING, client_terminated=true, recv_device="/device:CPU:0", send_device="/device:CPU:0", send_device_incarnation=-1268297528044505333, tensor_name="StringFormat:0"](StringFormat)
2022-06-13 13:40:41.157439: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Send:CPU::_send_StringFormat_0 takes 0.468us
2022-06-13 13:40:41.157445: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Send:CPU::_send_StringFormat_0 takes 0.299us
2022-06-13 13:40:41.157462: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node _send_StringFormat_0}} = _Send[T=DT_STRING, client_terminated=true, recv_device="/device:CPU:0", send_device="/device:CPU:0", send_device_incarnation=-1268297528044505333, tensor_name="StringFormat:0"](StringFormat) takes 32.714us

2022-06-13 13:40:41.157493: I tensorflow/core/common_runtime/executor.cc:783] Process node: 0 step -1 {{node _SOURCE}} = NoOp[]() device: /device:CPU:0
2022-06-13 13:40:41.157505: I tensorflow/core/common_runtime/executor.cc:783] Process node: 2 step -1 {{node StringFormat}} = StringFormat[T=[], _XlaHasReferenceVars=false, placeholder="{}", summarize=3, template="run 0 costs 0", _device="/job:localhost/replica:0/task:0/device:CPU:0"]() device: /device:CPU:0
2022-06-13 13:40:41.157526: I tensorflow/core/common_runtime/executor.cc:783] Process node: 3 step -1 {{node _send_StringFormat_0}} = _Send[T=DT_STRING, client_terminated=true, recv_device="/device:CPU:0", send_device="/device:CPU:0", send_device_incarnation=-1268297528044505333, tensor_name="StringFormat:0"](StringFormat) device: /device:CPU:0
2022-06-13 13:40:41.157572: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Const:CPU::StringFormat/_0__cf__0 takes 0.748us
2022-06-13 13:40:41.157581: I tensorflow/core/common_runtime/constant_folding.cc:562] Replacing StringFormat :: 0 with a constant
2022-06-13 13:40:41.157625: I tensorflow/core/common_runtime/constant_folding.cc:613] No constant foldable nodes found
2022-06-13 13:40:41.157663: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node StringFormat/_0__cf__0}} = Const[dtype=DT_STRING, value=Tensor<type: string shape: [] values: run 0 costs 0>, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()
2022-06-13 13:40:41.157675: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Const:CPU::StringFormat/_0__cf__0 takes 0.493us
2022-06-13 13:40:41.157681: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Const:CPU::StringFormat/_0__cf__0 takes 0.276us
2022-06-13 13:40:41.157700: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node StringFormat/_0__cf__0}} = Const[dtype=DT_STRING, value=Tensor<type: string shape: [] values: run 0 costs 0>, _device="/job:localhost/replica:0/task:0/device:CPU:0"]() takes 40.282us

2022-06-13 13:40:41.157710: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node output_retval_RetVal}} = _Retval[T=DT_STRING, index=0](StringFormat)
2022-06-13 13:40:41.157718: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:CPU::output_retval_RetVal takes 0.422us
2022-06-13 13:40:41.157723: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:CPU::output_retval_RetVal takes 0.255us
2022-06-13 13:40:41.157733: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node output_retval_RetVal}} = _Retval[T=DT_STRING, index=0](StringFormat) takes 22.593us

2022-06-13 13:40:41.157791: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:57] Process op PrintV2 in device
2022-06-13 13:40:41.157799: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:58] Number of return vals is 0
2022-06-13 13:40:41.157804: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:95] Execute PrintV2 in device
2022-06-13 13:40:41.157823: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: PrintV2:GPU::PrintV2 takes 1.877us
2022-06-13 13:40:41.157833: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: PrintV2:CPU::PrintV2 takes 0.538us
2022-06-13 13:40:41.157841: I tensorflow/core/common_runtime/eager/execute.cc:923] PreferredDevice PrintV2: /job:localhost/replica:0/task:0
2022-06-13 13:40:41.157846: I tensorflow/core/common_runtime/eager/execute.cc:924] Placer place op [PrintV2] on device: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.157853: I tensorflow/core/common_runtime/eager/execute.cc:982] PrintV2:input:0 /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.157862: I tensorflow/core/common_runtime/eager/execute.cc:1062] Device for [PrintV2] already set to: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.157935: I tensorflow/core/common_runtime/eager/execute.cc:823] signature {
  name: "__wrapped__PrintV2_device_/job:localhost/replica:0/task:0/device:CPU:0"
  input_arg {
    name: "input"
    type: DT_STRING
  }
  attr {
    name: "output_stream"
    type: "string"
    default_value {
      s: "stderr"
    }
  }
  attr {
    name: "end"
    type: "string"
    default_value {
      s: "\n"
    }
  }
  is_stateful: true
}
node_def {
  name: "PrintV2"
  op: "PrintV2"
  input: "input:0"
  device: "/job:localhost/replica:0/task:0/device:CPU:0"
  attr {
    key: "end"
    value {
      placeholder: "end"
    }
  }
  attr {
    key: "output_stream"
    value {
      placeholder: "output_stream"
    }
  }
}

2022-06-13 13:40:41.157953: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__wrapped__PrintV2_device_/job:localhost/replica:0/task:0/device:CPU:0' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.157976: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__wrapped__PrintV2_device_/job:localhost/replica:0/task:0/device:CPU:0' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.157993: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__wrapped__PrintV2_device_/job:localhost/replica:0/task:0/device:CPU:0' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.158013: I tensorflow/core/common_runtime/process_function_library_runtime.cc:772] Instantiating MultiDevice function "__wrapped__PrintV2_device_/job:localhost/replica:0/task:0/device:CPU:0" on default device "/job:localhost/replica:0/task:0/device:CPU:0"
2022-06-13 13:40:41.158088: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 0
2022-06-13 13:40:41.158098: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 0
2022-06-13 13:40:41.158103: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: MlirV1CompatGraphOptimizationPass
2022-06-13 13:40:41.158109: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 9
2022-06-13 13:40:41.158114: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ControlFlowDepsToChainsPass
2022-06-13 13:40:41.158119: I tensorflow/core/common_runtime/control_flow_deps_to_chains.cc:37] ControlFlowDepsToChainsPass::Run
2022-06-13 13:40:41.158127: W tensorflow/core/util/dump_graph.cc:134] Failed to dump control_flow_deps_to_chains_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.158139: W tensorflow/core/util/dump_graph.cc:134] Failed to dump control_flow_deps_to_chains_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.158147: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 10
2022-06-13 13:40:41.158152: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: AccumulateNV2RemovePass
2022-06-13 13:40:41.158157: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: LowerFunctionalOpsPass
2022-06-13 13:40:41.158164: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ParallelConcatRemovePass
2022-06-13 13:40:41.158169: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 35
2022-06-13 13:40:41.158173: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: IsolatePlacerInspectionRequiredOpsPass
2022-06-13 13:40:41.158178: I tensorflow/core/common_runtime/isolate_placer_inspection_required_ops_pass.cc:34] IsolatePlacerInspectionRequiredOpsPass::Run
2022-06-13 13:40:41.158184: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: IntroduceFloatingPointJitterPass
2022-06-13 13:40:41.158189: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 36
2022-06-13 13:40:41.158193: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: EncapsulateXlaComputationsPass
2022-06-13 13:40:41.158201: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_xla_computations_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.158207: I tensorflow/compiler/jit/encapsulate_xla_computations_pass.cc:353] EncapsulateXlaComputations(): (failed to create writable file: INVALID_ARGUMENT: TF_DUMP_GRAPH_PREFIX not specified)
2022-06-13 13:40:41.158234: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_xla_computations_halfway because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.158242: I tensorflow/compiler/jit/encapsulate_xla_computations_pass.cc:364] EncapsulateXlaComputations() half-way: (failed to create writable file: INVALID_ARGUMENT: TF_DUMP_GRAPH_PREFIX not specified)
2022-06-13 13:40:41.158250: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_xla_computations_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.158257: I tensorflow/compiler/jit/encapsulate_xla_computations_pass.cc:370] EncapsulateXlaComputations() finished: (failed to create writable file: INVALID_ARGUMENT: TF_DUMP_GRAPH_PREFIX not specified)
2022-06-13 13:40:41.158262: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 37
2022-06-13 13:40:41.158267: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: FunctionalizeControlFlowForXlaPass
2022-06-13 13:40:41.158279: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 99999
2022-06-13 13:40:41.158283: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: WeakForwardTypeInferencePass
2022-06-13 13:40:41.158288: I tensorflow/core/common_runtime/forward_type_inference.cc:130] ForwardTypeInferencePass::Run
2022-06-13 13:40:41.158295: W tensorflow/core/util/dump_graph.cc:134] Failed to dump forward_type_inference_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.158309: I tensorflow/core/common_runtime/forward_type_inference.cc:311] Finished after 1 iterations; done 4 of 4 nodes in 4 visits
2022-06-13 13:40:41.158316: W tensorflow/core/util/dump_graph.cc:134] Failed to dump forward_type_inference_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.158325: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 0
2022-06-13 13:40:41.158342: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node input}}'Will fall back to a default kernel.

2022-06-13 13:40:41.158353: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::input takes 14.032us
2022-06-13 13:40:41.158359: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:CPU::input takes 0.517us
2022-06-13 13:40:41.158368: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: PrintV2:GPU::PrintV2 takes 1.028us
2022-06-13 13:40:41.158374: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: PrintV2:CPU::PrintV2 takes 0.547us
2022-06-13 13:40:41.158384: I tensorflow/core/common_runtime/placer.cc:124] input(_Arg) placed on: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.158391: I tensorflow/core/common_runtime/placer.cc:124] PrintV2(PrintV2) placed on: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.158397: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 1
2022-06-13 13:40:41.158402: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 0
2022-06-13 13:40:41.158406: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: NcclReplacePass
2022-06-13 13:40:41.158412: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 1
2022-06-13 13:40:41.158417: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 2
2022-06-13 13:40:41.158421: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 5
2022-06-13 13:40:41.158426: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: CloneConstantsForBetterClusteringPass
2022-06-13 13:40:41.158432: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 9
2022-06-13 13:40:41.158437: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ClusterScopingPass
2022-06-13 13:40:41.158442: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 10
2022-06-13 13:40:41.158446: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: MarkForCompilationPass
2022-06-13 13:40:41.158646: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: PrintV2:XLA_CPU_JIT::PrintV2 takes 0.921us
2022-06-13 13:40:41.158686: I tensorflow/compiler/jit/mark_for_compilation_pass.cc:1523] MarkForCompilationPassImpl::Run time: 229 us (cumulative: 1.19 ms, max: 442 us, #called: 4)
2022-06-13 13:40:41.158694: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 12
2022-06-13 13:40:41.158699: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ForceXlaConstantsOnHostPass
2022-06-13 13:40:41.158707: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 20
2022-06-13 13:40:41.158712: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: IncreaseDynamismForAutoJitPass
2022-06-13 13:40:41.158718: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 30
2022-06-13 13:40:41.158723: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: PartiallyDeclusterPass
2022-06-13 13:40:41.158739: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 40
2022-06-13 13:40:41.158748: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ReportClusteringInfoPass
2022-06-13 13:40:41.158764: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 50
2022-06-13 13:40:41.158768: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: EncapsulateSubgraphsPass
2022-06-13 13:40:41.158773: I tensorflow/compiler/jit/encapsulate_subgraphs_pass.cc:1139] EncapsulateSubgraphsPass::Run
2022-06-13 13:40:41.158783: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_subgraphs_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.158829: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_subgraphs_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.158847: I tensorflow/compiler/jit/xla_cluster_util.cc:590] GetNodesRelatedToRefVariables() found 0 nodes
2022-06-13 13:40:41.158860: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 60
2022-06-13 13:40:41.158867: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: BuildXlaOpsPass
2022-06-13 13:40:41.158873: I tensorflow/compiler/jit/build_xla_ops_pass.cc:603] print_outputs = 0
2022-06-13 13:40:41.158877: I tensorflow/compiler/jit/build_xla_ops_pass.cc:604] check_input_numerics = 0
2022-06-13 13:40:41.158882: I tensorflow/compiler/jit/build_xla_ops_pass.cc:605] check_output_numerics = 0
2022-06-13 13:40:41.158890: W tensorflow/core/util/dump_graph.cc:134] Failed to dump build_xla_ops because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.158900: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 2
2022-06-13 13:40:41.158914: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:CPU::input takes 0.923us
2022-06-13 13:40:41.158923: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: PrintV2:CPU::PrintV2 takes 0.432us
2022-06-13 13:40:41.158939: I tensorflow/core/graph/graph_partition.cc:1251] Added send/recv: controls=0, data=0
2022-06-13 13:40:41.158974: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 3
2022-06-13 13:40:41.158982: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 1
2022-06-13 13:40:41.158987: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: MklLayoutRewritePass
2022-06-13 13:40:41.159858: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 3
2022-06-13 13:40:41.159883: W tensorflow/core/util/dump_graph.cc:134] Failed to dump pflr_after_all_optimization_passes_564230080_/job:localhost/replica:0/task:0/device:CPU:0 because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.159929: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__wrapped__PrintV2_device_/job:localhost/replica:0/task:0/device:CPU:0_15747355229267941188_0' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.159946: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1105] Start instantiating component function __wrapped__PrintV2_device_/job:localhost/replica:0/task:0/device:CPU:0_15747355229267941188_0 on device /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.160009: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1114] Finished instantiating component function __wrapped__PrintV2_device_/job:localhost/replica:0/task:0/device:CPU:0_15747355229267941188_0 with handle 8 status: OK
2022-06-13 13:40:41.160041: I tensorflow/core/common_runtime/eager/execute.cc:1353] Executing op PrintV2 in device /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.160056: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1302] Running component function on device /job:localhost/replica:0/task:0/device:CPU:0 from __wrapped__PrintV2_device_/job:localhost/replica:0/task:0/device:CPU:0 with handle 8
2022-06-13 13:40:41.160080: I tensorflow/core/common_runtime/constant_folding.cc:613] No constant foldable nodes found
2022-06-13 13:40:41.160117: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node PrintV2}} = PrintV2[_XlaHasReferenceVars=false, end="\n", output_stream="stderr", _device="/job:localhost/replica:0/task:0/device:CPU:0"](input)
2022-06-13 13:40:41.160132: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: PrintV2:CPU::PrintV2 takes 1.325us
2022-06-13 13:40:41.160138: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: PrintV2:CPU::PrintV2 takes 0.316us
2022-06-13 13:40:41.160158: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node PrintV2}} = PrintV2[_XlaHasReferenceVars=false, end="\n", output_stream="stderr", _device="/job:localhost/replica:0/task:0/device:CPU:0"](input) takes 41.182us

run 0 costs 0
# run 1 schedule start
2022-06-13 13:40:41.200081: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__inference_nn_18' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.200694: I tensorflow/python/eager/pywrap_tfe_src.cc:885] Eager executes cancelable __inference_nn_18 on  the number of inputs is 3 the number of output is 1
2022-06-13 13:40:41.200729: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__inference_nn_18' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.200753: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__inference_nn_18' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.200771: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__inference_nn_18' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.200779: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:57] Process op __inference_nn_18 in device
2022-06-13 13:40:41.200785: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:58] Number of return vals is 1
2022-06-13 13:40:41.200790: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:95] Execute __inference_nn_18 in device
2022-06-13 13:40:41.200806: I tensorflow/core/common_runtime/eager/execute.cc:982] __inference_nn_18:input:0 /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.200814: I tensorflow/core/common_runtime/eager/execute.cc:982] __inference_nn_18:input:1 /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.200820: I tensorflow/core/common_runtime/eager/execute.cc:982] __inference_nn_18:input:2 /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.200855: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__inference_nn_18' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.200870: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__inference_nn_18' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.200881: I tensorflow/core/common_runtime/eager/execute.cc:923] PreferredDevice __inference_nn_18: /job:localhost/replica:0/task:0
2022-06-13 13:40:41.200886: I tensorflow/core/common_runtime/eager/execute.cc:924] Placer place op [__inference_nn_18] on device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.200960: I tensorflow/core/common_runtime/process_function_library_runtime.cc:772] Instantiating MultiDevice function "__inference_nn_18" on default device "/job:localhost/replica:0/task:0/device:GPU:0"
2022-06-13 13:40:41.201156: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 0
2022-06-13 13:40:41.201167: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 0
2022-06-13 13:40:41.201172: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: MlirV1CompatGraphOptimizationPass
2022-06-13 13:40:41.201178: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 9
2022-06-13 13:40:41.201183: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ControlFlowDepsToChainsPass
2022-06-13 13:40:41.201189: I tensorflow/core/common_runtime/control_flow_deps_to_chains.cc:37] ControlFlowDepsToChainsPass::Run
2022-06-13 13:40:41.201206: W tensorflow/core/util/dump_graph.cc:134] Failed to dump control_flow_deps_to_chains_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.201228: W tensorflow/core/util/dump_graph.cc:134] Failed to dump control_flow_deps_to_chains_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.201242: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 10
2022-06-13 13:40:41.201247: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: AccumulateNV2RemovePass
2022-06-13 13:40:41.201252: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: LowerFunctionalOpsPass
2022-06-13 13:40:41.201262: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ParallelConcatRemovePass
2022-06-13 13:40:41.201267: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 35
2022-06-13 13:40:41.201272: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: IsolatePlacerInspectionRequiredOpsPass
2022-06-13 13:40:41.201277: I tensorflow/core/common_runtime/isolate_placer_inspection_required_ops_pass.cc:34] IsolatePlacerInspectionRequiredOpsPass::Run
2022-06-13 13:40:41.201284: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: IntroduceFloatingPointJitterPass
2022-06-13 13:40:41.201290: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 36
2022-06-13 13:40:41.201295: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: EncapsulateXlaComputationsPass
2022-06-13 13:40:41.201309: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_xla_computations_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.201319: I tensorflow/compiler/jit/encapsulate_xla_computations_pass.cc:353] EncapsulateXlaComputations(): (failed to create writable file: INVALID_ARGUMENT: TF_DUMP_GRAPH_PREFIX not specified)
2022-06-13 13:40:41.201366: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_xla_computations_halfway because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.201378: I tensorflow/compiler/jit/encapsulate_xla_computations_pass.cc:364] EncapsulateXlaComputations() half-way: (failed to create writable file: INVALID_ARGUMENT: TF_DUMP_GRAPH_PREFIX not specified)
2022-06-13 13:40:41.201393: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_xla_computations_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.201403: I tensorflow/compiler/jit/encapsulate_xla_computations_pass.cc:370] EncapsulateXlaComputations() finished: (failed to create writable file: INVALID_ARGUMENT: TF_DUMP_GRAPH_PREFIX not specified)
2022-06-13 13:40:41.201409: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 37
2022-06-13 13:40:41.201413: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: FunctionalizeControlFlowForXlaPass
2022-06-13 13:40:41.201433: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 99999
2022-06-13 13:40:41.201438: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: WeakForwardTypeInferencePass
2022-06-13 13:40:41.201444: I tensorflow/core/common_runtime/forward_type_inference.cc:130] ForwardTypeInferencePass::Run
2022-06-13 13:40:41.201456: W tensorflow/core/util/dump_graph.cc:134] Failed to dump forward_type_inference_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.201481: I tensorflow/core/common_runtime/forward_type_inference.cc:311] Finished after 1 iterations; done 9 of 9 nodes in 9 visits
2022-06-13 13:40:41.201496: W tensorflow/core/util/dump_graph.cc:134] Failed to dump forward_type_inference_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.201508: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 0
2022-06-13 13:40:41.201535: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node w}}'Will fall back to a default kernel.

2022-06-13 13:40:41.201549: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::w takes 17.377us
2022-06-13 13:40:41.201556: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:CPU::w takes 0.867us
2022-06-13 13:40:41.201566: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node b}}'Will fall back to a default kernel.

2022-06-13 13:40:41.201572: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::b takes 6.809us
2022-06-13 13:40:41.201578: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:CPU::b takes 0.326us
2022-06-13 13:40:41.201585: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node x}}'Will fall back to a default kernel.

2022-06-13 13:40:41.201590: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::x takes 6.148us
2022-06-13 13:40:41.201596: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:CPU::x takes 0.299us
2022-06-13 13:40:41.201606: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: BatchMatMulV2:GPU::MatMul takes 3.231us
2022-06-13 13:40:41.201616: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: BatchMatMulV2:CPU::MatMul takes 5.182us
2022-06-13 13:40:41.201625: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: AddV2:GPU::Add takes 3.112us
2022-06-13 13:40:41.201634: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: AddV2:CPU::Add takes 3.364us
2022-06-13 13:40:41.201645: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Identity:GPU::Identity takes 5.064us
2022-06-13 13:40:41.201651: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Identity:CPU::Identity takes 0.77us
2022-06-13 13:40:41.201661: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node identity_RetVal}}'Will fall back to a default kernel.

2022-06-13 13:40:41.201667: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:GPU::identity_RetVal takes 9.881us
2022-06-13 13:40:41.201673: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:CPU::identity_RetVal takes 0.54us
2022-06-13 13:40:41.201687: I tensorflow/core/common_runtime/placer.cc:124] w(_Arg) placed on: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.201696: I tensorflow/core/common_runtime/placer.cc:124] b(_Arg) placed on: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.201701: I tensorflow/core/common_runtime/placer.cc:124] x(_Arg) placed on: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.201710: I tensorflow/core/common_runtime/placer.cc:124] MatMul(BatchMatMulV2) placed on: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.201716: I tensorflow/core/common_runtime/placer.cc:124] Add(AddV2) placed on: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.201722: I tensorflow/core/common_runtime/placer.cc:124] Identity(Identity) placed on: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.201728: I tensorflow/core/common_runtime/placer.cc:124] identity_RetVal(_Retval) placed on: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.201734: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 1
2022-06-13 13:40:41.201739: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 0
2022-06-13 13:40:41.201744: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: NcclReplacePass
2022-06-13 13:40:41.201751: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 1
2022-06-13 13:40:41.201766: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 2
2022-06-13 13:40:41.201773: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 5
2022-06-13 13:40:41.201778: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: CloneConstantsForBetterClusteringPass
2022-06-13 13:40:41.201785: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 9
2022-06-13 13:40:41.201790: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ClusterScopingPass
2022-06-13 13:40:41.201795: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 10
2022-06-13 13:40:41.201799: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: MarkForCompilationPass
2022-06-13 13:40:41.202047: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: BatchMatMulV2:XLA_GPU_JIT::MatMul takes 1.526us
2022-06-13 13:40:41.202066: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: AddV2:XLA_GPU_JIT::Add takes 1.043us
2022-06-13 13:40:41.202082: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Identity:XLA_GPU_JIT::Identity takes 0.905us
2022-06-13 13:40:41.202144: I tensorflow/compiler/jit/mark_for_compilation_pass.cc:650] DeadnessAnalysis time: 14 us (cumulative: 26 us, max: 14 us, #called: 2)
2022-06-13 13:40:41.202192: I tensorflow/compiler/jit/mark_for_compilation_pass.cc:1523] MarkForCompilationPassImpl::Run time: 380 us (cumulative: 1.57 ms, max: 442 us, #called: 5)
2022-06-13 13:40:41.202204: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 12
2022-06-13 13:40:41.202209: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ForceXlaConstantsOnHostPass
2022-06-13 13:40:41.202219: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 20
2022-06-13 13:40:41.202224: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: IncreaseDynamismForAutoJitPass
2022-06-13 13:40:41.202230: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 30
2022-06-13 13:40:41.202234: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: PartiallyDeclusterPass
2022-06-13 13:40:41.202258: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 40
2022-06-13 13:40:41.202265: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ReportClusteringInfoPass
2022-06-13 13:40:41.202294: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 50
2022-06-13 13:40:41.202302: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: EncapsulateSubgraphsPass
2022-06-13 13:40:41.202307: I tensorflow/compiler/jit/encapsulate_subgraphs_pass.cc:1139] EncapsulateSubgraphsPass::Run
2022-06-13 13:40:41.202328: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_subgraphs_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.202408: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_subgraphs_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.202436: I tensorflow/compiler/jit/xla_cluster_util.cc:590] GetNodesRelatedToRefVariables() found 0 nodes
2022-06-13 13:40:41.202462: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 60
2022-06-13 13:40:41.202471: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: BuildXlaOpsPass
2022-06-13 13:40:41.202480: I tensorflow/compiler/jit/build_xla_ops_pass.cc:603] print_outputs = 0
2022-06-13 13:40:41.202487: I tensorflow/compiler/jit/build_xla_ops_pass.cc:604] check_input_numerics = 0
2022-06-13 13:40:41.202493: I tensorflow/compiler/jit/build_xla_ops_pass.cc:605] check_output_numerics = 0
2022-06-13 13:40:41.202510: W tensorflow/core/util/dump_graph.cc:134] Failed to dump build_xla_ops because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.202523: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 2
2022-06-13 13:40:41.202545: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node w}}'Will fall back to a default kernel.

2022-06-13 13:40:41.202555: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::w takes 13.423us
2022-06-13 13:40:41.202567: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node b}}'Will fall back to a default kernel.

2022-06-13 13:40:41.202572: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::b takes 6.645us
2022-06-13 13:40:41.202580: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node x}}'Will fall back to a default kernel.

2022-06-13 13:40:41.202586: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::x takes 6.696us
2022-06-13 13:40:41.202596: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: BatchMatMulV2:GPU::MatMul takes 2.676us
2022-06-13 13:40:41.202608: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: AddV2:GPU::Add takes 3.833us
2022-06-13 13:40:41.202621: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Identity:GPU::Identity takes 5.903us
2022-06-13 13:40:41.202633: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node identity_RetVal}}'Will fall back to a default kernel.

2022-06-13 13:40:41.202643: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:GPU::identity_RetVal takes 14.103us
2022-06-13 13:40:41.202674: I tensorflow/core/graph/graph_partition.cc:1251] Added send/recv: controls=0, data=0
2022-06-13 13:40:41.202732: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 3
2022-06-13 13:40:41.202740: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 1
2022-06-13 13:40:41.202745: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: MklLayoutRewritePass
2022-06-13 13:40:41.202754: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _Arg, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.202758: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _Arg, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.202763: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _Arg, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.202767: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node BatchMatMulV2, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.202772: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node AddV2, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.202776: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node Identity, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.202780: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _Retval, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.202786: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _Arg, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.202790: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _Arg, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.202794: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _Arg, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.202799: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node BatchMatMulV2, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.202803: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node AddV2, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.202807: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node Identity, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.202811: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _Retval, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.202817: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _Arg, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.202821: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _Arg, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.202825: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _Arg, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.202829: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node BatchMatMulV2, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.202834: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node AddV2, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.202838: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node Identity, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.202842: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _Retval, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.202848: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 3
2022-06-13 13:40:41.202865: W tensorflow/core/util/dump_graph.cc:134] Failed to dump pflr_after_all_optimization_passes_562417792_/job:localhost/replica:0/task:0/device:GPU:0 because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.202931: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__inference_nn_18_18275566768249955521_0' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.202955: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1105] Start instantiating component function __inference_nn_18_18275566768249955521_0 on device /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.203070: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1114] Finished instantiating component function __inference_nn_18_18275566768249955521_0 with handle 10 status: OK
2022-06-13 13:40:41.203125: I tensorflow/core/common_runtime/eager/execute.cc:1353] Executing op __inference_nn_18 in device /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.203153: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1437] Running component function on device /job:localhost/replica:0/task:0/device:GPU:0 from __inference_nn_18 with handle 10
2022-06-13 13:40:41.203201: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: BatchMatMulV2:CPU::MatMul takes 4.042us
2022-06-13 13:40:41.203214: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: AddV2:CPU::Add takes 3.091us
2022-06-13 13:40:41.203221: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Identity:CPU::Identity takes 0.628us
2022-06-13 13:40:41.203227: I tensorflow/core/common_runtime/constant_folding.cc:613] No constant foldable nodes found
2022-06-13 13:40:41.203267: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _SOURCE}}'Will fall back to a default kernel.

2022-06-13 13:40:41.203276: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: NoOp:GPU::_SOURCE takes 9.894us
2022-06-13 13:40:41.203283: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _SINK}}'Will fall back to a default kernel.

2022-06-13 13:40:41.203288: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: NoOp:GPU::_SINK takes 5.317us
2022-06-13 13:40:41.203297: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node w}}'Will fall back to a default kernel.

2022-06-13 13:40:41.203302: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::w takes 7.803us
2022-06-13 13:40:41.203311: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node b}}'Will fall back to a default kernel.

2022-06-13 13:40:41.203316: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::b takes 6.303us
2022-06-13 13:40:41.203324: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node x}}'Will fall back to a default kernel.

2022-06-13 13:40:41.203329: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::x takes 6.888us
2022-06-13 13:40:41.203337: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: BatchMatMulV2:GPU::MatMul takes 1.336us
2022-06-13 13:40:41.203347: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: AddV2:GPU::Add takes 1.303us
2022-06-13 13:40:41.203356: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Identity:GPU::Identity takes 2.688us
2022-06-13 13:40:41.203366: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node identity_retval_RetVal}}'Will fall back to a default kernel.

2022-06-13 13:40:41.203372: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:GPU::identity_retval_RetVal takes 7.858us
2022-06-13 13:40:41.203380: I tensorflow/core/common_runtime/memory_types.cc:87] 2:0 -> 5:0: 0 -> 0
2022-06-13 13:40:41.203385: I tensorflow/core/common_runtime/memory_types.cc:87] 4:0 -> 5:1: 0 -> 0
2022-06-13 13:40:41.203390: I tensorflow/core/common_runtime/memory_types.cc:87] 5:0 -> 6:0: 0 -> 0
2022-06-13 13:40:41.203395: I tensorflow/core/common_runtime/memory_types.cc:87] 3:0 -> 6:1: 0 -> 0
2022-06-13 13:40:41.203399: I tensorflow/core/common_runtime/memory_types.cc:87] 6:0 -> 7:0: 0 -> 0
2022-06-13 13:40:41.203404: I tensorflow/core/common_runtime/memory_types.cc:87] 7:0 -> 8:0: 0 -> 0
2022-06-13 13:40:41.203410: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _SOURCE}}'Will fall back to a default kernel.

2022-06-13 13:40:41.203416: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: NoOp:GPU::_SOURCE takes 5.629us
2022-06-13 13:40:41.203422: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _SINK}}'Will fall back to a default kernel.

2022-06-13 13:40:41.203427: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: NoOp:GPU::_SINK takes 5.187us
2022-06-13 13:40:41.203434: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node w}}'Will fall back to a default kernel.

2022-06-13 13:40:41.203440: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::w takes 6.715us
2022-06-13 13:40:41.203447: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node b}}'Will fall back to a default kernel.

2022-06-13 13:40:41.203452: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::b takes 6.077us
2022-06-13 13:40:41.203459: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node x}}'Will fall back to a default kernel.

2022-06-13 13:40:41.203465: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::x takes 6.509us
2022-06-13 13:40:41.203472: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: BatchMatMulV2:GPU::MatMul takes 0.649us
2022-06-13 13:40:41.203480: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: AddV2:GPU::Add takes 0.896us
2022-06-13 13:40:41.203488: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Identity:GPU::Identity takes 1.143us
2022-06-13 13:40:41.203496: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node identity_retval_RetVal}}'Will fall back to a default kernel.

2022-06-13 13:40:41.203502: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:GPU::identity_retval_RetVal takes 6.389us
2022-06-13 13:40:41.203508: I tensorflow/core/common_runtime/memory_types.cc:87] 2:0 -> 5:0: 0 -> 0
2022-06-13 13:40:41.203513: I tensorflow/core/common_runtime/memory_types.cc:87] 4:0 -> 5:1: 0 -> 0
2022-06-13 13:40:41.203517: I tensorflow/core/common_runtime/memory_types.cc:87] 5:0 -> 6:0: 0 -> 0
2022-06-13 13:40:41.203522: I tensorflow/core/common_runtime/memory_types.cc:87] 3:0 -> 6:1: 0 -> 0
2022-06-13 13:40:41.203526: I tensorflow/core/common_runtime/memory_types.cc:87] 6:0 -> 7:0: 0 -> 0
2022-06-13 13:40:41.203530: I tensorflow/core/common_runtime/memory_types.cc:87] 7:0 -> 8:0: 0 -> 0
2022-06-13 13:40:41.203582: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node _SOURCE}} = NoOp[]()
2022-06-13 13:40:41.203591: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _SOURCE}}'Will fall back to a default kernel.

2022-06-13 13:40:41.203596: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: NoOp:GPU::_SOURCE takes 5.417us
2022-06-13 13:40:41.203602: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _SOURCE}}'Will fall back to a default kernel.

2022-06-13 13:40:41.203607: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: NoOp:GPU::_SOURCE takes 5.224us
2022-06-13 13:40:41.203618: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node _SOURCE}} = NoOp[]() takes 34.862us

2022-06-13 13:40:41.203632: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node w}} = _Arg[T=DT_FLOAT, _XlaHasReferenceVars=false, _output_shapes=[[1024,128]], _user_specified_name="W", index=0]()
2022-06-13 13:40:41.203643: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node w}}'Will fall back to a default kernel.

2022-06-13 13:40:41.203648: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::w takes 6.909us
2022-06-13 13:40:41.203655: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node w}}'Will fall back to a default kernel.

2022-06-13 13:40:41.203660: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::w takes 6.162us
2022-06-13 13:40:41.203671: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node w}} = _Arg[T=DT_FLOAT, _XlaHasReferenceVars=false, _output_shapes=[[1024,128]], _user_specified_name="W", index=0]() takes 41.881us

2022-06-13 13:40:41.203680: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node b}} = _Arg[T=DT_FLOAT, _XlaHasReferenceVars=false, _output_shapes=[[1024,128]], _user_specified_name="b", index=1]()
2022-06-13 13:40:41.203688: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node b}}'Will fall back to a default kernel.

2022-06-13 13:40:41.203693: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::b takes 6.309us
2022-06-13 13:40:41.203700: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node b}}'Will fall back to a default kernel.

2022-06-13 13:40:41.203705: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::b takes 6.059us
2022-06-13 13:40:41.203714: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node b}} = _Arg[T=DT_FLOAT, _XlaHasReferenceVars=false, _output_shapes=[[1024,128]], _user_specified_name="b", index=1]() takes 33.348us

2022-06-13 13:40:41.203723: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node x}} = _Arg[T=DT_FLOAT, _XlaHasReferenceVars=false, _output_shapes=[[4,128,128]], _user_specified_name="x", index=2]()
2022-06-13 13:40:41.203731: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node x}}'Will fall back to a default kernel.

2022-06-13 13:40:41.203736: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::x takes 6.638us
2022-06-13 13:40:41.203742: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node x}}'Will fall back to a default kernel.

2022-06-13 13:40:41.203747: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::x takes 5.848us
2022-06-13 13:40:41.203757: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node x}} = _Arg[T=DT_FLOAT, _XlaHasReferenceVars=false, _output_shapes=[[4,128,128]], _user_specified_name="x", index=2]() takes 33.291us

2022-06-13 13:40:41.203765: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node MatMul}} = BatchMatMulV2[T=DT_FLOAT, _XlaHasReferenceVars=false, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](w, x)
2022-06-13 13:40:41.203773: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: BatchMatMulV2:GPU::MatMul takes 0.825us
2022-06-13 13:40:41.203780: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: BatchMatMulV2:GPU::MatMul takes 0.629us
2022-06-13 13:40:41.203799: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node MatMul}} = BatchMatMulV2[T=DT_FLOAT, _XlaHasReferenceVars=false, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](w, x) takes 33.324us

2022-06-13 13:40:41.203807: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node Add}} = AddV2[T=DT_FLOAT, _XlaHasReferenceVars=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](MatMul, b)
2022-06-13 13:40:41.203815: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: AddV2:GPU::Add takes 0.96us
2022-06-13 13:40:41.203822: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: AddV2:GPU::Add takes 0.825us
2022-06-13 13:40:41.203835: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node Add}} = AddV2[T=DT_FLOAT, _XlaHasReferenceVars=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](MatMul, b) takes 26.139us

2022-06-13 13:40:41.203842: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node Identity}} = Identity[T=DT_FLOAT, _XlaHasReferenceVars=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Add)
2022-06-13 13:40:41.203850: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Identity:GPU::Identity takes 1.321us
2022-06-13 13:40:41.203857: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Identity:GPU::Identity takes 1.14us
2022-06-13 13:40:41.203865: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node Identity}} = Identity[T=DT_FLOAT, _XlaHasReferenceVars=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Add) takes 22.379us

2022-06-13 13:40:41.203873: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node identity_retval_RetVal}} = _Retval[T=DT_FLOAT, index=0](Identity)
2022-06-13 13:40:41.203880: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node identity_retval_RetVal}}'Will fall back to a default kernel.

2022-06-13 13:40:41.203886: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:GPU::identity_retval_RetVal takes 6.908us
2022-06-13 13:40:41.203892: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node identity_retval_RetVal}}'Will fall back to a default kernel.

2022-06-13 13:40:41.203897: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:GPU::identity_retval_RetVal takes 6.018us
2022-06-13 13:40:41.203906: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node identity_retval_RetVal}} = _Retval[T=DT_FLOAT, index=0](Identity) takes 32.932us
# run 1 schedule end
# run 1 compute start
2022-06-13 13:40:41.203989: I tensorflow/core/common_runtime/executor.cc:783] Process node: 0 step -4266793230068582322 {{node _SOURCE}} = NoOp[]() device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.204086: I tensorflow/core/common_runtime/executor.cc:783] Process node: 2 step -4266793230068582322 {{node w}} = _Arg[T=DT_FLOAT, _XlaHasReferenceVars=false, _output_shapes=[[1024,128]], _user_specified_name="W", index=0]() device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.204123: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper w op _Arg on GPU 0 stream[0]
2022-06-13 13:40:41.204142: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] w:_Arg
2022-06-13 13:40:41.204180: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled w op _Arg on GPU 0 stream[0]
2022-06-13 13:40:41.204214: I tensorflow/core/common_runtime/executor.cc:783] Process node: 3 step -4266793230068582322 {{node b}} = _Arg[T=DT_FLOAT, _XlaHasReferenceVars=false, _output_shapes=[[1024,128]], _user_specified_name="b", index=1]() device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.204234: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper b op _Arg on GPU 0 stream[0]
2022-06-13 13:40:41.204271: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] b:_Arg
2022-06-13 13:40:41.204289: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled b op _Arg on GPU 0 stream[0]
2022-06-13 13:40:41.204317: I tensorflow/core/common_runtime/executor.cc:783] Process node: 4 step -4266793230068582322 {{node x}} = _Arg[T=DT_FLOAT, _XlaHasReferenceVars=false, _output_shapes=[[4,128,128]], _user_specified_name="x", index=2]() device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.204336: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper x op _Arg on GPU 0 stream[0]
2022-06-13 13:40:41.204349: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] x:_Arg
2022-06-13 13:40:41.204363: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled x op _Arg on GPU 0 stream[0]
2022-06-13 13:40:41.204389: I tensorflow/core/common_runtime/executor.cc:783] Process node: 5 step -4266793230068582322 {{node MatMul}} = BatchMatMulV2[T=DT_FLOAT, _XlaHasReferenceVars=false, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](w, x) device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.204418: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper MatMul op BatchMatMulV2 on GPU 0 stream[0]
2022-06-13 13:40:41.204438: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] MatMul:BatchMatMulV2#shape=(float[1024,128];float[4,128,128])#
2022-06-13 13:40:41.204625: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2022-06-13 13:40:41.702486: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2022-06-13 13:40:41.703054: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled MatMul op BatchMatMulV2 on GPU 0 stream[0]
2022-06-13 13:40:41.703097: I tensorflow/core/common_runtime/executor.cc:783] Process node: 6 step -4266793230068582322 {{node Add}} = AddV2[T=DT_FLOAT, _XlaHasReferenceVars=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](MatMul, b) device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.703115: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper Add op AddV2 on GPU 0 stream[0]
2022-06-13 13:40:41.703127: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] Add:AddV2#shape=(float[4,1024,128];float[1024,128])#
2022-06-13 13:40:41.703340: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled Add op AddV2 on GPU 0 stream[0]
2022-06-13 13:40:41.703358: I tensorflow/core/common_runtime/executor.cc:783] Process node: 7 step -4266793230068582322 {{node Identity}} = Identity[T=DT_FLOAT, _XlaHasReferenceVars=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Add) device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.703367: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper Identity op Identity on GPU 0 stream[0]
2022-06-13 13:40:41.703374: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] Identity:Identity#shape=(float[4,1024,128])#
2022-06-13 13:40:41.703381: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled Identity op Identity on GPU 0 stream[0]
2022-06-13 13:40:41.703390: I tensorflow/core/common_runtime/executor.cc:783] Process node: 8 step -4266793230068582322 {{node identity_retval_RetVal}} = _Retval[T=DT_FLOAT, index=0](Identity) device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.703395: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper identity_retval_RetVal op _Retval on GPU 0 stream[0]
2022-06-13 13:40:41.703401: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] identity_retval_RetVal:_Retval#shape=(float[4,1024,128])#
2022-06-13 13:40:41.703407: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled identity_retval_RetVal op _Retval on GPU 0 stream[0]
# run 1 compute end

# below is output of run 1
2022-06-13 13:40:41.704761: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:57] Process op StringFormat in device
2022-06-13 13:40:41.704824: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:58] Number of return vals is 1
2022-06-13 13:40:41.704849: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:95] Execute StringFormat in device
2022-06-13 13:40:41.704957: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:GPU::StringFormat takes 10.906us
2022-06-13 13:40:41.704984: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:CPU::StringFormat takes 2.563us
2022-06-13 13:40:41.705014: I tensorflow/core/common_runtime/eager/execute.cc:923] PreferredDevice StringFormat: /job:localhost/replica:0/task:0
2022-06-13 13:40:41.705032: I tensorflow/core/common_runtime/eager/execute.cc:924] Placer place op [StringFormat] on device: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.705064: I tensorflow/core/common_runtime/eager/execute.cc:1062] Device for [StringFormat] already set to: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.705146: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__wrapped__StringFormat_T_0_device_/job:localhost/replica:0/task:0/device:CPU:0' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.705205: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__wrapped__StringFormat_T_0_device_/job:localhost/replica:0/task:0/device:CPU:0' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.705291: I tensorflow/core/common_runtime/process_function_library_runtime.cc:772] Instantiating MultiDevice function "__wrapped__StringFormat_T_0_device_/job:localhost/replica:0/task:0/device:CPU:0" on default device "/job:localhost/replica:0/task:0/device:CPU:0"
2022-06-13 13:40:41.705682: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 0
2022-06-13 13:40:41.705716: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 0
2022-06-13 13:40:41.705736: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: MlirV1CompatGraphOptimizationPass
2022-06-13 13:40:41.705754: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 9
2022-06-13 13:40:41.705769: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ControlFlowDepsToChainsPass
2022-06-13 13:40:41.705783: I tensorflow/core/common_runtime/control_flow_deps_to_chains.cc:37] ControlFlowDepsToChainsPass::Run
2022-06-13 13:40:41.705818: W tensorflow/core/util/dump_graph.cc:134] Failed to dump control_flow_deps_to_chains_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.705863: W tensorflow/core/util/dump_graph.cc:134] Failed to dump control_flow_deps_to_chains_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.705893: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 10
2022-06-13 13:40:41.705910: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: AccumulateNV2RemovePass
2022-06-13 13:40:41.705941: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: LowerFunctionalOpsPass
2022-06-13 13:40:41.705966: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ParallelConcatRemovePass
2022-06-13 13:40:41.705995: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 35
2022-06-13 13:40:41.706019: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: IsolatePlacerInspectionRequiredOpsPass
2022-06-13 13:40:41.706047: I tensorflow/core/common_runtime/isolate_placer_inspection_required_ops_pass.cc:34] IsolatePlacerInspectionRequiredOpsPass::Run
2022-06-13 13:40:41.706073: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: IntroduceFloatingPointJitterPass
2022-06-13 13:40:41.706092: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 36
2022-06-13 13:40:41.706105: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: EncapsulateXlaComputationsPass
2022-06-13 13:40:41.706134: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_xla_computations_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.706161: I tensorflow/compiler/jit/encapsulate_xla_computations_pass.cc:353] EncapsulateXlaComputations(): (failed to create writable file: INVALID_ARGUMENT: TF_DUMP_GRAPH_PREFIX not specified)
2022-06-13 13:40:41.706260: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_xla_computations_halfway because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.706288: I tensorflow/compiler/jit/encapsulate_xla_computations_pass.cc:364] EncapsulateXlaComputations() half-way: (failed to create writable file: INVALID_ARGUMENT: TF_DUMP_GRAPH_PREFIX not specified)
2022-06-13 13:40:41.706316: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_xla_computations_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.706342: I tensorflow/compiler/jit/encapsulate_xla_computations_pass.cc:370] EncapsulateXlaComputations() finished: (failed to create writable file: INVALID_ARGUMENT: TF_DUMP_GRAPH_PREFIX not specified)
2022-06-13 13:40:41.706359: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 37
2022-06-13 13:40:41.706372: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: FunctionalizeControlFlowForXlaPass
2022-06-13 13:40:41.706412: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 99999
2022-06-13 13:40:41.706429: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: WeakForwardTypeInferencePass
2022-06-13 13:40:41.706444: I tensorflow/core/common_runtime/forward_type_inference.cc:130] ForwardTypeInferencePass::Run
2022-06-13 13:40:41.706467: W tensorflow/core/util/dump_graph.cc:134] Failed to dump forward_type_inference_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.706514: I tensorflow/core/common_runtime/forward_type_inference.cc:311] Finished after 1 iterations; done 4 of 4 nodes in 4 visits
2022-06-13 13:40:41.706542: W tensorflow/core/util/dump_graph.cc:134] Failed to dump forward_type_inference_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.706572: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 0
2022-06-13 13:40:41.706625: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:GPU::StringFormat takes 6.128us
2022-06-13 13:40:41.706649: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:CPU::StringFormat takes 2.128us
2022-06-13 13:40:41.706686: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node output_RetVal}}'Will fall back to a default kernel.

2022-06-13 13:40:41.706707: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:GPU::output_RetVal takes 30.352us
2022-06-13 13:40:41.706728: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:CPU::output_RetVal takes 1.323us
2022-06-13 13:40:41.706769: I tensorflow/core/common_runtime/placer.cc:124] output_RetVal(_Retval) placed on: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.706796: I tensorflow/core/common_runtime/placer.cc:124] StringFormat(StringFormat) placed on: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.706816: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 1
2022-06-13 13:40:41.706835: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 0
2022-06-13 13:40:41.706853: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: NcclReplacePass
2022-06-13 13:40:41.706877: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 1
2022-06-13 13:40:41.706899: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 2
2022-06-13 13:40:41.706912: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 5
2022-06-13 13:40:41.706931: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: CloneConstantsForBetterClusteringPass
2022-06-13 13:40:41.706949: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 9
2022-06-13 13:40:41.706962: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ClusterScopingPass
2022-06-13 13:40:41.706981: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 10
2022-06-13 13:40:41.706999: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: MarkForCompilationPass
2022-06-13 13:40:41.707578: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:XLA_CPU_JIT::StringFormat takes 3.548us
2022-06-13 13:40:41.707710: I tensorflow/compiler/jit/mark_for_compilation_pass.cc:1523] MarkForCompilationPassImpl::Run time: 676 us (cumulative: 2.25 ms, max: 676 us, #called: 6)
2022-06-13 13:40:41.707733: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 12
2022-06-13 13:40:41.707748: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ForceXlaConstantsOnHostPass
2022-06-13 13:40:41.707774: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 20
2022-06-13 13:40:41.707790: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: IncreaseDynamismForAutoJitPass
2022-06-13 13:40:41.707806: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 30
2022-06-13 13:40:41.707823: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: PartiallyDeclusterPass
2022-06-13 13:40:41.707872: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 40
2022-06-13 13:40:41.707894: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ReportClusteringInfoPass
2022-06-13 13:40:41.707947: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 50
2022-06-13 13:40:41.707966: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: EncapsulateSubgraphsPass
2022-06-13 13:40:41.707981: I tensorflow/compiler/jit/encapsulate_subgraphs_pass.cc:1139] EncapsulateSubgraphsPass::Run
2022-06-13 13:40:41.708015: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_subgraphs_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.708162: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_subgraphs_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.708218: I tensorflow/compiler/jit/xla_cluster_util.cc:590] GetNodesRelatedToRefVariables() found 0 nodes
2022-06-13 13:40:41.708282: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 60
2022-06-13 13:40:41.708301: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: BuildXlaOpsPass
2022-06-13 13:40:41.708320: I tensorflow/compiler/jit/build_xla_ops_pass.cc:603] print_outputs = 0
2022-06-13 13:40:41.708334: I tensorflow/compiler/jit/build_xla_ops_pass.cc:604] check_input_numerics = 0
2022-06-13 13:40:41.708346: I tensorflow/compiler/jit/build_xla_ops_pass.cc:605] check_output_numerics = 0
2022-06-13 13:40:41.708373: W tensorflow/core/util/dump_graph.cc:134] Failed to dump build_xla_ops because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.708407: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 2
2022-06-13 13:40:41.708455: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:CPU::StringFormat takes 2.891us
2022-06-13 13:40:41.708490: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:CPU::output_RetVal takes 1.428us
2022-06-13 13:40:41.708547: I tensorflow/core/graph/graph_partition.cc:1251] Added send/recv: controls=0, data=0
2022-06-13 13:40:41.708657: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 3
2022-06-13 13:40:41.708680: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 1
2022-06-13 13:40:41.708693: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: MklLayoutRewritePass
2022-06-13 13:40:41.708750: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 3
2022-06-13 13:40:41.708783: W tensorflow/core/util/dump_graph.cc:134] Failed to dump pflr_after_all_optimization_passes_562384224_/job:localhost/replica:0/task:0/device:CPU:0 because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.708894: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__wrapped__StringFormat_T_0_device_/job:localhost/replica:0/task:0/device:CPU:0_9847480399019665821_0' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.708935: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1105] Start instantiating component function __wrapped__StringFormat_T_0_device_/job:localhost/replica:0/task:0/device:CPU:0_9847480399019665821_0 on device /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.709112: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1114] Finished instantiating component function __wrapped__StringFormat_T_0_device_/job:localhost/replica:0/task:0/device:CPU:0_9847480399019665821_0 with handle 12 status: OK
2022-06-13 13:40:41.709218: I tensorflow/core/common_runtime/eager/execute.cc:1353] Executing op StringFormat in device /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.709263: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1302] Running component function on device /job:localhost/replica:0/task:0/device:CPU:0 from __wrapped__StringFormat_T_0_device_/job:localhost/replica:0/task:0/device:CPU:0 with handle 12
2022-06-13 13:40:41.709346: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:CPU::StringFormat takes 2.816us
2022-06-13 13:40:41.709387: I tensorflow/core/common_runtime/constant_folding.cc:631] Constant foldable 3 : 4
2022-06-13 13:40:41.709568: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node _SOURCE}} = NoOp[]()
2022-06-13 13:40:41.709598: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _SOURCE}}'Will fall back to a default kernel.

2022-06-13 13:40:41.709623: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: NoOp:CPU::_SOURCE takes 24.807us
2022-06-13 13:40:41.709644: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _SOURCE}}'Will fall back to a default kernel.

2022-06-13 13:40:41.709665: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: NoOp:CPU::_SOURCE takes 19.599us
2022-06-13 13:40:41.709695: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node _SOURCE}} = NoOp[]() takes 124.611us

2022-06-13 13:40:41.709732: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node StringFormat}} = StringFormat[T=[], _XlaHasReferenceVars=false, placeholder="{}", summarize=3, template="run 1 costs 543.6217784881592", _device="/job:localhost/replica:0/task:0/device:CPU:0"]()
2022-06-13 13:40:41.709761: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:CPU::StringFormat takes 1.432us
2022-06-13 13:40:41.709783: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:CPU::StringFormat takes 0.894us
2022-06-13 13:40:41.709821: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node StringFormat}} = StringFormat[T=[], _XlaHasReferenceVars=false, placeholder="{}", summarize=3, template="run 1 costs 543.6217784881592", _device="/job:localhost/replica:0/task:0/device:CPU:0"]() takes 92.588us

2022-06-13 13:40:41.709853: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node _send_StringFormat_0}} = _Send[T=DT_STRING, client_terminated=true, recv_device="/device:CPU:0", send_device="/device:CPU:0", send_device_incarnation=-8764232170173109502, tensor_name="StringFormat:0"](StringFormat)
2022-06-13 13:40:41.709881: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Send:CPU::_send_StringFormat_0 takes 1.588us
2022-06-13 13:40:41.709902: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Send:CPU::_send_StringFormat_0 takes 0.901us
2022-06-13 13:40:41.709954: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node _send_StringFormat_0}} = _Send[T=DT_STRING, client_terminated=true, recv_device="/device:CPU:0", send_device="/device:CPU:0", send_device_incarnation=-8764232170173109502, tensor_name="StringFormat:0"](StringFormat) takes 100.759us

2022-06-13 13:40:41.709997: I tensorflow/core/common_runtime/executor.cc:783] Process node: 0 step -1 {{node _SOURCE}} = NoOp[]() device: /device:CPU:0
2022-06-13 13:40:41.710024: I tensorflow/core/common_runtime/executor.cc:783] Process node: 2 step -1 {{node StringFormat}} = StringFormat[T=[], _XlaHasReferenceVars=false, placeholder="{}", summarize=3, template="run 1 costs 543.6217784881592", _device="/job:localhost/replica:0/task:0/device:CPU:0"]() device: /device:CPU:0
2022-06-13 13:40:41.710070: I tensorflow/core/common_runtime/executor.cc:783] Process node: 3 step -1 {{node _send_StringFormat_0}} = _Send[T=DT_STRING, client_terminated=true, recv_device="/device:CPU:0", send_device="/device:CPU:0", send_device_incarnation=-8764232170173109502, tensor_name="StringFormat:0"](StringFormat) device: /device:CPU:0
2022-06-13 13:40:41.710179: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Const:CPU::StringFormat/_0__cf__0 takes 2.086us
2022-06-13 13:40:41.710200: I tensorflow/core/common_runtime/constant_folding.cc:562] Replacing StringFormat :: 0 with a constant
2022-06-13 13:40:41.710308: I tensorflow/core/common_runtime/constant_folding.cc:613] No constant foldable nodes found
2022-06-13 13:40:41.710413: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node StringFormat/_0__cf__0}} = Const[dtype=DT_STRING, value=Tensor<type: string shape: [] values: run 1 costs 543.6217784881592>, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()
2022-06-13 13:40:41.710445: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Const:CPU::StringFormat/_0__cf__0 takes 1.563us
2022-06-13 13:40:41.710466: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Const:CPU::StringFormat/_0__cf__0 takes 0.879us
2022-06-13 13:40:41.710516: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node StringFormat/_0__cf__0}} = Const[dtype=DT_STRING, value=Tensor<type: string shape: [] values: run 1 costs 543.6217784881592>, _device="/job:localhost/replica:0/task:0/device:CPU:0"]() takes 108.155us

2022-06-13 13:40:41.710543: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node output_retval_RetVal}} = _Retval[T=DT_STRING, index=0](StringFormat)
2022-06-13 13:40:41.710568: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:CPU::output_retval_RetVal takes 1.131us
2022-06-13 13:40:41.710588: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:CPU::output_retval_RetVal takes 0.889us
2022-06-13 13:40:41.710616: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node output_retval_RetVal}} = _Retval[T=DT_STRING, index=0](StringFormat) takes 70.259us

2022-06-13 13:40:41.710751: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:57] Process op PrintV2 in device
2022-06-13 13:40:41.710774: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:58] Number of return vals is 0
2022-06-13 13:40:41.710789: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:95] Execute PrintV2 in device
2022-06-13 13:40:41.710817: I tensorflow/core/common_runtime/eager/execute.cc:982] PrintV2:input:0 /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.710864: I tensorflow/core/common_runtime/eager/execute.cc:1353] Executing op PrintV2 in device /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.710899: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1302] Running component function on device /job:localhost/replica:0/task:0/device:CPU:0 from __wrapped__PrintV2_device_/job:localhost/replica:0/task:0/device:CPU:0 with handle 8
# before is output of run 1
run 1 costs 543.6217784881592

# run 2 schedule start
2022-06-13 13:40:41.711880: I tensorflow/python/eager/pywrap_tfe_src.cc:885] Eager executes cancelable __inference_nn_18 on  the number of inputs is 3 the number of output is 1
2022-06-13 13:40:41.711948: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__inference_nn_18' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.711994: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__inference_nn_18' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.712033: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__inference_nn_18' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.712057: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:57] Process op __inference_nn_18 in device
2022-06-13 13:40:41.712075: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:58] Number of return vals is 1
2022-06-13 13:40:41.712089: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:95] Execute __inference_nn_18 in device
2022-06-13 13:40:41.712113: I tensorflow/core/common_runtime/eager/execute.cc:982] __inference_nn_18:input:0 /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.712139: I tensorflow/core/common_runtime/eager/execute.cc:982] __inference_nn_18:input:1 /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.712160: I tensorflow/core/common_runtime/eager/execute.cc:982] __inference_nn_18:input:2 /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.712202: I tensorflow/core/common_runtime/eager/execute.cc:1353] Executing op __inference_nn_18 in device /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.712248: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1437] Running component function on device /job:localhost/replica:0/task:0/device:GPU:0 from __inference_nn_18 with handle 10
# run 2 schedule end
# run 2 compute start
2022-06-13 13:40:41.712374: I tensorflow/core/common_runtime/executor.cc:783] Process node: 0 step -6484413189335459115 {{node _SOURCE}} = NoOp[]() device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.712493: I tensorflow/core/common_runtime/executor.cc:783] Process node: 2 step -6484413189335459115 {{node w}} = _Arg[T=DT_FLOAT, _XlaHasReferenceVars=false, _output_shapes=[[1024,128]], _user_specified_name="W", index=0]() device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.712541: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper w op _Arg on GPU 0 stream[0]
2022-06-13 13:40:41.712570: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] w:_Arg
2022-06-13 13:40:41.712616: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled w op _Arg on GPU 0 stream[0]
2022-06-13 13:40:41.712665: I tensorflow/core/common_runtime/executor.cc:783] Process node: 3 step -6484413189335459115 {{node b}} = _Arg[T=DT_FLOAT, _XlaHasReferenceVars=false, _output_shapes=[[1024,128]], _user_specified_name="b", index=1]() device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.712697: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper b op _Arg on GPU 0 stream[0]
2022-06-13 13:40:41.712723: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] b:_Arg
2022-06-13 13:40:41.712751: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled b op _Arg on GPU 0 stream[0]
2022-06-13 13:40:41.712793: I tensorflow/core/common_runtime/executor.cc:783] Process node: 4 step -6484413189335459115 {{node x}} = _Arg[T=DT_FLOAT, _XlaHasReferenceVars=false, _output_shapes=[[4,128,128]], _user_specified_name="x", index=2]() device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.712825: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper x op _Arg on GPU 0 stream[0]
2022-06-13 13:40:41.712850: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] x:_Arg
2022-06-13 13:40:41.712873: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled x op _Arg on GPU 0 stream[0]
2022-06-13 13:40:41.712913: I tensorflow/core/common_runtime/executor.cc:783] Process node: 5 step -6484413189335459115 {{node MatMul}} = BatchMatMulV2[T=DT_FLOAT, _XlaHasReferenceVars=false, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](w, x) device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.712953: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper MatMul op BatchMatMulV2 on GPU 0 stream[0]
2022-06-13 13:40:41.712988: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] MatMul:BatchMatMulV2#shape=(float[1024,128];float[4,128,128])#
2022-06-13 13:40:41.713191: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled MatMul op BatchMatMulV2 on GPU 0 stream[0]
2022-06-13 13:40:41.713242: I tensorflow/core/common_runtime/executor.cc:783] Process node: 6 step -6484413189335459115 {{node Add}} = AddV2[T=DT_FLOAT, _XlaHasReferenceVars=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](MatMul, b) device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.713276: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper Add op AddV2 on GPU 0 stream[0]
2022-06-13 13:40:41.713309: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] Add:AddV2#shape=(float[4,1024,128];float[1024,128])#
2022-06-13 13:40:41.713394: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled Add op AddV2 on GPU 0 stream[0]
2022-06-13 13:40:41.713436: I tensorflow/core/common_runtime/executor.cc:783] Process node: 7 step -6484413189335459115 {{node Identity}} = Identity[T=DT_FLOAT, _XlaHasReferenceVars=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Add) device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.713469: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper Identity op Identity on GPU 0 stream[0]
2022-06-13 13:40:41.713499: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] Identity:Identity#shape=(float[4,1024,128])#
2022-06-13 13:40:41.713529: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled Identity op Identity on GPU 0 stream[0]
2022-06-13 13:40:41.713565: I tensorflow/core/common_runtime/executor.cc:783] Process node: 8 step -6484413189335459115 {{node identity_retval_RetVal}} = _Retval[T=DT_FLOAT, index=0](Identity) device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.713594: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper identity_retval_RetVal op _Retval on GPU 0 stream[0]
2022-06-13 13:40:41.713625: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] identity_retval_RetVal:_Retval#shape=(float[4,1024,128])#
2022-06-13 13:40:41.713652: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled identity_retval_RetVal op _Retval on GPU 0 stream[0]
# run 2 compute end

# below is output of run 2
2022-06-13 13:40:41.714335: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:57] Process op StringFormat in device
2022-06-13 13:40:41.714380: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:58] Number of return vals is 1
2022-06-13 13:40:41.714397: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:95] Execute StringFormat in device
2022-06-13 13:40:41.714470: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:GPU::StringFormat takes 6.227us
2022-06-13 13:40:41.714493: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:CPU::StringFormat takes 1.748us
2022-06-13 13:40:41.714520: I tensorflow/core/common_runtime/eager/execute.cc:923] PreferredDevice StringFormat: /job:localhost/replica:0/task:0
2022-06-13 13:40:41.714542: I tensorflow/core/common_runtime/eager/execute.cc:924] Placer place op [StringFormat] on device: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.714584: I tensorflow/core/common_runtime/eager/execute.cc:1062] Device for [StringFormat] already set to: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.714678: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__wrapped__StringFormat_T_0_device_/job:localhost/replica:0/task:0/device:CPU:0' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.714766: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__wrapped__StringFormat_T_0_device_/job:localhost/replica:0/task:0/device:CPU:0' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.714836: I tensorflow/core/common_runtime/process_function_library_runtime.cc:772] Instantiating MultiDevice function "__wrapped__StringFormat_T_0_device_/job:localhost/replica:0/task:0/device:CPU:0" on default device "/job:localhost/replica:0/task:0/device:CPU:0"
2022-06-13 13:40:41.715107: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 0
2022-06-13 13:40:41.715137: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 0
2022-06-13 13:40:41.715151: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: MlirV1CompatGraphOptimizationPass
2022-06-13 13:40:41.715170: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 9
2022-06-13 13:40:41.715183: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ControlFlowDepsToChainsPass
2022-06-13 13:40:41.715196: I tensorflow/core/common_runtime/control_flow_deps_to_chains.cc:37] ControlFlowDepsToChainsPass::Run
2022-06-13 13:40:41.715230: W tensorflow/core/util/dump_graph.cc:134] Failed to dump control_flow_deps_to_chains_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.715271: W tensorflow/core/util/dump_graph.cc:134] Failed to dump control_flow_deps_to_chains_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.715302: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 10
2022-06-13 13:40:41.715319: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: AccumulateNV2RemovePass
2022-06-13 13:40:41.715335: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: LowerFunctionalOpsPass
2022-06-13 13:40:41.715367: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ParallelConcatRemovePass
2022-06-13 13:40:41.715398: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 35
2022-06-13 13:40:41.715424: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: IsolatePlacerInspectionRequiredOpsPass
2022-06-13 13:40:41.715450: I tensorflow/core/common_runtime/isolate_placer_inspection_required_ops_pass.cc:34] IsolatePlacerInspectionRequiredOpsPass::Run
2022-06-13 13:40:41.715474: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: IntroduceFloatingPointJitterPass
2022-06-13 13:40:41.715494: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 36
2022-06-13 13:40:41.715517: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: EncapsulateXlaComputationsPass
2022-06-13 13:40:41.715557: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_xla_computations_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.715586: I tensorflow/compiler/jit/encapsulate_xla_computations_pass.cc:353] EncapsulateXlaComputations(): (failed to create writable file: INVALID_ARGUMENT: TF_DUMP_GRAPH_PREFIX not specified)
2022-06-13 13:40:41.715699: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_xla_computations_halfway because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.715735: I tensorflow/compiler/jit/encapsulate_xla_computations_pass.cc:364] EncapsulateXlaComputations() half-way: (failed to create writable file: INVALID_ARGUMENT: TF_DUMP_GRAPH_PREFIX not specified)
2022-06-13 13:40:41.715779: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_xla_computations_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.715815: I tensorflow/compiler/jit/encapsulate_xla_computations_pass.cc:370] EncapsulateXlaComputations() finished: (failed to create writable file: INVALID_ARGUMENT: TF_DUMP_GRAPH_PREFIX not specified)
2022-06-13 13:40:41.715836: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 37
2022-06-13 13:40:41.715849: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: FunctionalizeControlFlowForXlaPass
2022-06-13 13:40:41.715892: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 99999
2022-06-13 13:40:41.715913: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: WeakForwardTypeInferencePass
2022-06-13 13:40:41.715927: I tensorflow/core/common_runtime/forward_type_inference.cc:130] ForwardTypeInferencePass::Run
2022-06-13 13:40:41.715952: W tensorflow/core/util/dump_graph.cc:134] Failed to dump forward_type_inference_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.716005: I tensorflow/core/common_runtime/forward_type_inference.cc:311] Finished after 1 iterations; done 4 of 4 nodes in 4 visits
2022-06-13 13:40:41.716049: W tensorflow/core/util/dump_graph.cc:134] Failed to dump forward_type_inference_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.716091: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 0
2022-06-13 13:40:41.716146: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:GPU::StringFormat takes 4.792us
2022-06-13 13:40:41.716179: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:CPU::StringFormat takes 2.345us
2022-06-13 13:40:41.716225: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node output_RetVal}}'Will fall back to a default kernel.

2022-06-13 13:40:41.716279: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:GPU::output_RetVal takes 40.945us
2022-06-13 13:40:41.716314: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:CPU::output_RetVal takes 2.908us
2022-06-13 13:40:41.716365: I tensorflow/core/common_runtime/placer.cc:124] output_RetVal(_Retval) placed on: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.716396: I tensorflow/core/common_runtime/placer.cc:124] StringFormat(StringFormat) placed on: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.716422: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 1
2022-06-13 13:40:41.716450: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 0
2022-06-13 13:40:41.716473: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: NcclReplacePass
2022-06-13 13:40:41.716501: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 1
2022-06-13 13:40:41.716530: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 2
2022-06-13 13:40:41.716555: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 5
2022-06-13 13:40:41.716580: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: CloneConstantsForBetterClusteringPass
2022-06-13 13:40:41.716610: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 9
2022-06-13 13:40:41.716635: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ClusterScopingPass
2022-06-13 13:40:41.716661: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 10
2022-06-13 13:40:41.716678: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: MarkForCompilationPass
2022-06-13 13:40:41.717268: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:XLA_CPU_JIT::StringFormat takes 3.93us
2022-06-13 13:40:41.717401: I tensorflow/compiler/jit/mark_for_compilation_pass.cc:1523] MarkForCompilationPassImpl::Run time: 691 us (cumulative: 2.94 ms, max: 691 us, #called: 7)
2022-06-13 13:40:41.717426: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 12
2022-06-13 13:40:41.717452: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ForceXlaConstantsOnHostPass
2022-06-13 13:40:41.717492: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 20
2022-06-13 13:40:41.717508: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: IncreaseDynamismForAutoJitPass
2022-06-13 13:40:41.717524: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 30
2022-06-13 13:40:41.717547: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: PartiallyDeclusterPass
2022-06-13 13:40:41.717625: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 40
2022-06-13 13:40:41.717648: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ReportClusteringInfoPass
2022-06-13 13:40:41.717709: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 50
2022-06-13 13:40:41.717738: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: EncapsulateSubgraphsPass
2022-06-13 13:40:41.717760: I tensorflow/compiler/jit/encapsulate_subgraphs_pass.cc:1139] EncapsulateSubgraphsPass::Run
2022-06-13 13:40:41.717812: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_subgraphs_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.717990: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_subgraphs_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.718060: I tensorflow/compiler/jit/xla_cluster_util.cc:590] GetNodesRelatedToRefVariables() found 0 nodes
2022-06-13 13:40:41.718108: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 60
2022-06-13 13:40:41.718125: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: BuildXlaOpsPass
2022-06-13 13:40:41.718142: I tensorflow/compiler/jit/build_xla_ops_pass.cc:603] print_outputs = 0
2022-06-13 13:40:41.718156: I tensorflow/compiler/jit/build_xla_ops_pass.cc:604] check_input_numerics = 0
2022-06-13 13:40:41.718169: I tensorflow/compiler/jit/build_xla_ops_pass.cc:605] check_output_numerics = 0
2022-06-13 13:40:41.718197: W tensorflow/core/util/dump_graph.cc:134] Failed to dump build_xla_ops because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.718230: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 2
2022-06-13 13:40:41.718276: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:CPU::StringFormat takes 3.356us
2022-06-13 13:40:41.718311: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:CPU::output_RetVal takes 1.865us
2022-06-13 13:40:41.718371: I tensorflow/core/graph/graph_partition.cc:1251] Added send/recv: controls=0, data=0
2022-06-13 13:40:41.718491: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 3
2022-06-13 13:40:41.718523: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 1
2022-06-13 13:40:41.718542: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: MklLayoutRewritePass
2022-06-13 13:40:41.718595: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 3
2022-06-13 13:40:41.718631: W tensorflow/core/util/dump_graph.cc:134] Failed to dump pflr_after_all_optimization_passes_562491216_/job:localhost/replica:0/task:0/device:CPU:0 because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.718753: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__wrapped__StringFormat_T_0_device_/job:localhost/replica:0/task:0/device:CPU:0_3352841610890993264_0' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.718809: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1105] Start instantiating component function __wrapped__StringFormat_T_0_device_/job:localhost/replica:0/task:0/device:CPU:0_3352841610890993264_0 on device /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.719009: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1114] Finished instantiating component function __wrapped__StringFormat_T_0_device_/job:localhost/replica:0/task:0/device:CPU:0_3352841610890993264_0 with handle 14 status: OK
2022-06-13 13:40:41.719108: I tensorflow/core/common_runtime/eager/execute.cc:1353] Executing op StringFormat in device /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.719144: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1302] Running component function on device /job:localhost/replica:0/task:0/device:CPU:0 from __wrapped__StringFormat_T_0_device_/job:localhost/replica:0/task:0/device:CPU:0 with handle 14
2022-06-13 13:40:41.719232: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:CPU::StringFormat takes 3.641us
2022-06-13 13:40:41.719284: I tensorflow/core/common_runtime/constant_folding.cc:631] Constant foldable 3 : 4
2022-06-13 13:40:41.719498: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node _SOURCE}} = NoOp[]()
2022-06-13 13:40:41.719534: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _SOURCE}}'Will fall back to a default kernel.

2022-06-13 13:40:41.719558: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: NoOp:CPU::_SOURCE takes 24.345us
2022-06-13 13:40:41.719578: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _SOURCE}}'Will fall back to a default kernel.

2022-06-13 13:40:41.719598: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: NoOp:CPU::_SOURCE takes 18.973us
2022-06-13 13:40:41.719630: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node _SOURCE}} = NoOp[]() takes 129.879us

2022-06-13 13:40:41.719666: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node StringFormat}} = StringFormat[T=[], _XlaHasReferenceVars=false, placeholder="{}", summarize=3, template="run 2 costs 2.9511451721191406", _device="/job:localhost/replica:0/task:0/device:CPU:0"]()
2022-06-13 13:40:41.719696: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:CPU::StringFormat takes 1.331us
2022-06-13 13:40:41.719725: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:CPU::StringFormat takes 1.452us
2022-06-13 13:40:41.719784: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node StringFormat}} = StringFormat[T=[], _XlaHasReferenceVars=false, placeholder="{}", summarize=3, template="run 2 costs 2.9511451721191406", _device="/job:localhost/replica:0/task:0/device:CPU:0"]() takes 113.674us

2022-06-13 13:40:41.719820: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node _send_StringFormat_0}} = _Send[T=DT_STRING, client_terminated=true, recv_device="/device:CPU:0", send_device="/device:CPU:0", send_device_incarnation=-1704941691781989391, tensor_name="StringFormat:0"](StringFormat)
2022-06-13 13:40:41.719852: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Send:CPU::_send_StringFormat_0 takes 1.619us
2022-06-13 13:40:41.719881: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Send:CPU::_send_StringFormat_0 takes 1.853us
2022-06-13 13:40:41.719961: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node _send_StringFormat_0}} = _Send[T=DT_STRING, client_terminated=true, recv_device="/device:CPU:0", send_device="/device:CPU:0", send_device_incarnation=-1704941691781989391, tensor_name="StringFormat:0"](StringFormat) takes 138.823us

2022-06-13 13:40:41.720001: I tensorflow/core/common_runtime/executor.cc:783] Process node: 0 step -1 {{node _SOURCE}} = NoOp[]() device: /device:CPU:0
2022-06-13 13:40:41.720024: I tensorflow/core/common_runtime/executor.cc:783] Process node: 2 step -1 {{node StringFormat}} = StringFormat[T=[], _XlaHasReferenceVars=false, placeholder="{}", summarize=3, template="run 2 costs 2.9511451721191406", _device="/job:localhost/replica:0/task:0/device:CPU:0"]() device: /device:CPU:0
2022-06-13 13:40:41.720067: I tensorflow/core/common_runtime/executor.cc:783] Process node: 3 step -1 {{node _send_StringFormat_0}} = _Send[T=DT_STRING, client_terminated=true, recv_device="/device:CPU:0", send_device="/device:CPU:0", send_device_incarnation=-1704941691781989391, tensor_name="StringFormat:0"](StringFormat) device: /device:CPU:0
2022-06-13 13:40:41.720161: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Const:CPU::StringFormat/_0__cf__0 takes 1.843us
2022-06-13 13:40:41.720179: I tensorflow/core/common_runtime/constant_folding.cc:562] Replacing StringFormat :: 0 with a constant
2022-06-13 13:40:41.720280: I tensorflow/core/common_runtime/constant_folding.cc:613] No constant foldable nodes found
2022-06-13 13:40:41.720389: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node StringFormat/_0__cf__0}} = Const[dtype=DT_STRING, value=Tensor<type: string shape: [] values: run 2 costs 2.9511451721191406>, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()
2022-06-13 13:40:41.720421: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Const:CPU::StringFormat/_0__cf__0 takes 1.853us
2022-06-13 13:40:41.720445: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Const:CPU::StringFormat/_0__cf__0 takes 1.146us
2022-06-13 13:40:41.720496: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node StringFormat/_0__cf__0}} = Const[dtype=DT_STRING, value=Tensor<type: string shape: [] values: run 2 costs 2.9511451721191406>, _device="/job:localhost/replica:0/task:0/device:CPU:0"]() takes 112.88us

2022-06-13 13:40:41.720519: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node output_retval_RetVal}} = _Retval[T=DT_STRING, index=0](StringFormat)
2022-06-13 13:40:41.720544: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:CPU::output_retval_RetVal takes 0.96us
2022-06-13 13:40:41.720560: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:CPU::output_retval_RetVal takes 0.673us
2022-06-13 13:40:41.720589: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node output_retval_RetVal}} = _Retval[T=DT_STRING, index=0](StringFormat) takes 64.337us

2022-06-13 13:40:41.720691: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:57] Process op PrintV2 in device
2022-06-13 13:40:41.720716: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:58] Number of return vals is 0
2022-06-13 13:40:41.720733: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:95] Execute PrintV2 in device
2022-06-13 13:40:41.720763: I tensorflow/core/common_runtime/eager/execute.cc:982] PrintV2:input:0 /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.720804: I tensorflow/core/common_runtime/eager/execute.cc:1353] Executing op PrintV2 in device /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.720831: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1302] Running component function on device /job:localhost/replica:0/task:0/device:CPU:0 from __wrapped__PrintV2_device_/job:localhost/replica:0/task:0/device:CPU:0 with handle 8
# before is output of run 2

run 2 costs 2.9511451721191406
2022-06-13 13:40:41.721107: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:57] Process op _EagerConst in device
2022-06-13 13:40:41.721133: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:58] Number of return vals is 1
2022-06-13 13:40:41.721144: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:95] Execute _EagerConst in device
2022-06-13 13:40:41.721162: I tensorflow/core/common_runtime/eager/execute.cc:982] _EagerConst:input:0 /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.721189: I tensorflow/core/common_runtime/eager/execute.cc:1353] Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.721217: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1302] Running component function on device /job:localhost/replica:0/task:0/device:CPU:0 from __wrapped____EagerConst_device_/job:localhost/replica:0/task:0/device:GPU:0 with handle 0
2022-06-13 13:40:41.721242: I tensorflow/core/common_runtime/rendezvous_mgr.cc:167] IntraProcessRendezvous Send 0x214df9f0 /job:localhost/replica:0/task:0/device:CPU:0;95ed0dfd449ea1ea;/job:localhost/replica:0/task:0/device:GPU:0;edge_2_input;0:0
2022-06-13 13:40:41.721265: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1302] Running component function on device /job:localhost/replica:0/task:0/device:GPU:0 from __wrapped____EagerConst_device_/job:localhost/replica:0/task:0/device:GPU:0 with handle 1
2022-06-13 13:40:41.721296: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper input/_2 op _HostRecv on GPU 0 stream[0]
2022-06-13 13:40:41.721316: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] input/_2:_HostRecv#from=input,to=_EagerConst#
2022-06-13 13:40:41.721335: I tensorflow/core/common_runtime/rendezvous_mgr.cc:174] IntraProcessRendezvous Recv 0x214df9f0 /job:localhost/replica:0/task:0/device:CPU:0;95ed0dfd449ea1ea;/job:localhost/replica:0/task:0/device:GPU:0;edge_2_input;0:0
2022-06-13 13:40:41.721354: I tensorflow/core/common_runtime/rendezvous_mgr.cc:125] IntraProcessRendezvous Recv 0x214dfa10 /job:localhost/replica:0/task:0/device:CPU:0;95ed0dfd449ea1ea;/job:localhost/replica:0/task:0/device:GPU:0;edge_2_input;0:0
2022-06-13 13:40:41.721385: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled input/_2 op _HostRecv on GPU 0 stream[0]
2022-06-13 13:40:41.721410: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper _EagerConst op _EagerConst on GPU 0 stream[0]
2022-06-13 13:40:41.721427: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] _EagerConst:_EagerConst#shape=(int32[3])#
2022-06-13 13:40:41.721449: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled _EagerConst op _EagerConst on GPU 0 stream[0]
2022-06-13 13:40:41.721462: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper output_retval_RetVal op _Retval on GPU 0 stream[0]
2022-06-13 13:40:41.721473: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] output_retval_RetVal:_Retval#shape=(int32[3])#
2022-06-13 13:40:41.721486: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled output_retval_RetVal op _Retval on GPU 0 stream[0]
2022-06-13 13:40:41.721618: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:57] Process op RandomUniform in device
2022-06-13 13:40:41.721638: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:58] Number of return vals is 1
2022-06-13 13:40:41.721649: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:95] Execute RandomUniform in device
2022-06-13 13:40:41.721668: I tensorflow/core/common_runtime/eager/execute.cc:982] RandomUniform:input:0 /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.721702: I tensorflow/core/common_runtime/eager/execute.cc:1353] Executing op RandomUniform in device /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.721730: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1302] Running component function on device /job:localhost/replica:0/task:0/device:CPU:0 from __wrapped__RandomUniform_device_/job:localhost/replica:0/task:0/device:GPU:0 with handle 3
2022-06-13 13:40:41.721751: I tensorflow/core/common_runtime/rendezvous_mgr.cc:167] IntraProcessRendezvous Send 0x214df9f0 /job:localhost/replica:0/task:0/device:CPU:0;95ed0dfd449ea1ea;/job:localhost/replica:0/task:0/device:GPU:0;edge_2_shape;0:0
2022-06-13 13:40:41.721767: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1302] Running component function on device /job:localhost/replica:0/task:0/device:GPU:0 from __wrapped__RandomUniform_device_/job:localhost/replica:0/task:0/device:GPU:0 with handle 4
2022-06-13 13:40:41.721790: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper shape/_2 op _HostRecv on GPU 0 stream[0]
2022-06-13 13:40:41.721811: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] shape/_2:_HostRecv#from=shape,to=RandomUniform#
2022-06-13 13:40:41.721826: I tensorflow/core/common_runtime/rendezvous_mgr.cc:174] IntraProcessRendezvous Recv 0x214df9f0 /job:localhost/replica:0/task:0/device:CPU:0;95ed0dfd449ea1ea;/job:localhost/replica:0/task:0/device:GPU:0;edge_2_shape;0:0
2022-06-13 13:40:41.721837: I tensorflow/core/common_runtime/rendezvous_mgr.cc:125] IntraProcessRendezvous Recv 0x214dfa10 /job:localhost/replica:0/task:0/device:CPU:0;95ed0dfd449ea1ea;/job:localhost/replica:0/task:0/device:GPU:0;edge_2_shape;0:0
2022-06-13 13:40:41.721853: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled shape/_2 op _HostRecv on GPU 0 stream[0]
2022-06-13 13:40:41.721872: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper RandomUniform op RandomUniform on GPU 0 stream[0]
2022-06-13 13:40:41.721893: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] RandomUniform:RandomUniform#shape=(int32[3])#
2022-06-13 13:40:41.721955: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled RandomUniform op RandomUniform on GPU 0 stream[0]
2022-06-13 13:40:41.721975: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper output_retval_RetVal op _Retval on GPU 0 stream[0]
2022-06-13 13:40:41.721998: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] output_retval_RetVal:_Retval#shape=(float[4,128,128])#
2022-06-13 13:40:41.722015: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled output_retval_RetVal op _Retval on GPU 0 stream[0]


# run 3 schedule start
2022-06-13 13:40:41.731652: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__inference_nn_32' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.731981: I tensorflow/python/eager/pywrap_tfe_src.cc:885] Eager executes cancelable __inference_nn_32 on  the number of inputs is 3 the number of output is 1
2022-06-13 13:40:41.732019: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__inference_nn_32' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.732044: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__inference_nn_32' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.732066: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__inference_nn_32' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.732077: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:57] Process op __inference_nn_32 in device
2022-06-13 13:40:41.732084: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:58] Number of return vals is 1
2022-06-13 13:40:41.732092: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:95] Execute __inference_nn_32 in device
2022-06-13 13:40:41.732104: I tensorflow/core/common_runtime/eager/execute.cc:982] __inference_nn_32:input:0 /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.732115: I tensorflow/core/common_runtime/eager/execute.cc:982] __inference_nn_32:input:1 /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.732123: I tensorflow/core/common_runtime/eager/execute.cc:982] __inference_nn_32:input:2 /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.732166: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__inference_nn_32' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.732187: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__inference_nn_32' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.732198: I tensorflow/core/common_runtime/eager/execute.cc:923] PreferredDevice __inference_nn_32: /job:localhost/replica:0/task:0
2022-06-13 13:40:41.732205: I tensorflow/core/common_runtime/eager/execute.cc:924] Placer place op [__inference_nn_32] on device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.732284: I tensorflow/core/common_runtime/process_function_library_runtime.cc:772] Instantiating MultiDevice function "__inference_nn_32" on default device "/job:localhost/replica:0/task:0/device:GPU:0"
2022-06-13 13:40:41.732516: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 0
2022-06-13 13:40:41.732533: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 0
2022-06-13 13:40:41.732541: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: MlirV1CompatGraphOptimizationPass
2022-06-13 13:40:41.732548: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 9
2022-06-13 13:40:41.732555: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ControlFlowDepsToChainsPass
2022-06-13 13:40:41.732562: I tensorflow/core/common_runtime/control_flow_deps_to_chains.cc:37] ControlFlowDepsToChainsPass::Run
2022-06-13 13:40:41.732586: W tensorflow/core/util/dump_graph.cc:134] Failed to dump control_flow_deps_to_chains_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.732627: W tensorflow/core/util/dump_graph.cc:134] Failed to dump control_flow_deps_to_chains_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.732655: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 10
2022-06-13 13:40:41.732671: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: AccumulateNV2RemovePass
2022-06-13 13:40:41.732685: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: LowerFunctionalOpsPass
2022-06-13 13:40:41.732708: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ParallelConcatRemovePass
2022-06-13 13:40:41.732727: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 35
2022-06-13 13:40:41.732737: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: IsolatePlacerInspectionRequiredOpsPass
2022-06-13 13:40:41.732748: I tensorflow/core/common_runtime/isolate_placer_inspection_required_ops_pass.cc:34] IsolatePlacerInspectionRequiredOpsPass::Run
2022-06-13 13:40:41.732758: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: IntroduceFloatingPointJitterPass
2022-06-13 13:40:41.732767: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 36
2022-06-13 13:40:41.732773: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: EncapsulateXlaComputationsPass
2022-06-13 13:40:41.732799: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_xla_computations_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.732825: I tensorflow/compiler/jit/encapsulate_xla_computations_pass.cc:353] EncapsulateXlaComputations(): (failed to create writable file: INVALID_ARGUMENT: TF_DUMP_GRAPH_PREFIX not specified)
2022-06-13 13:40:41.732909: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_xla_computations_halfway because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.732930: I tensorflow/compiler/jit/encapsulate_xla_computations_pass.cc:364] EncapsulateXlaComputations() half-way: (failed to create writable file: INVALID_ARGUMENT: TF_DUMP_GRAPH_PREFIX not specified)
2022-06-13 13:40:41.732951: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_xla_computations_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.732969: I tensorflow/compiler/jit/encapsulate_xla_computations_pass.cc:370] EncapsulateXlaComputations() finished: (failed to create writable file: INVALID_ARGUMENT: TF_DUMP_GRAPH_PREFIX not specified)
2022-06-13 13:40:41.732978: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 37
2022-06-13 13:40:41.732984: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: FunctionalizeControlFlowForXlaPass
2022-06-13 13:40:41.733006: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 99999
2022-06-13 13:40:41.733015: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: WeakForwardTypeInferencePass
2022-06-13 13:40:41.733023: I tensorflow/core/common_runtime/forward_type_inference.cc:130] ForwardTypeInferencePass::Run
2022-06-13 13:40:41.733048: W tensorflow/core/util/dump_graph.cc:134] Failed to dump forward_type_inference_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.733095: I tensorflow/core/common_runtime/forward_type_inference.cc:311] Finished after 1 iterations; done 9 of 9 nodes in 9 visits
2022-06-13 13:40:41.733121: W tensorflow/core/util/dump_graph.cc:134] Failed to dump forward_type_inference_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.733142: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 0
2022-06-13 13:40:41.733170: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node w}}'Will fall back to a default kernel.

2022-06-13 13:40:41.733185: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::w takes 19.439us
2022-06-13 13:40:41.733196: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:CPU::w takes 1.335us
2022-06-13 13:40:41.733214: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node b}}'Will fall back to a default kernel.

2022-06-13 13:40:41.733231: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::b takes 18.093us
2022-06-13 13:40:41.733245: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:CPU::b takes 0.787us
2022-06-13 13:40:41.733263: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node x}}'Will fall back to a default kernel.

2022-06-13 13:40:41.733272: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::x takes 10.215us
2022-06-13 13:40:41.733280: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:CPU::x takes 0.401us
2022-06-13 13:40:41.733293: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: BatchMatMulV2:GPU::MatMul takes 3.308us
2022-06-13 13:40:41.733305: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: BatchMatMulV2:CPU::MatMul takes 3.302us
2022-06-13 13:40:41.733317: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: AddV2:GPU::Add takes 3.428us
2022-06-13 13:40:41.733333: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: AddV2:CPU::Add takes 6.716us
2022-06-13 13:40:41.733361: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Identity:GPU::Identity takes 10.569us
2022-06-13 13:40:41.733379: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Identity:CPU::Identity takes 2.335us
2022-06-13 13:40:41.733392: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node identity_RetVal}}'Will fall back to a default kernel.

2022-06-13 13:40:41.733400: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:GPU::identity_RetVal takes 11.22us
2022-06-13 13:40:41.733408: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:CPU::identity_RetVal takes 0.521us
2022-06-13 13:40:41.733427: I tensorflow/core/common_runtime/placer.cc:124] w(_Arg) placed on: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.733437: I tensorflow/core/common_runtime/placer.cc:124] b(_Arg) placed on: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.733445: I tensorflow/core/common_runtime/placer.cc:124] x(_Arg) placed on: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.733458: I tensorflow/core/common_runtime/placer.cc:124] MatMul(BatchMatMulV2) placed on: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.733468: I tensorflow/core/common_runtime/placer.cc:124] Add(AddV2) placed on: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.733479: I tensorflow/core/common_runtime/placer.cc:124] Identity(Identity) placed on: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.733492: I tensorflow/core/common_runtime/placer.cc:124] identity_RetVal(_Retval) placed on: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.733505: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 1
2022-06-13 13:40:41.733517: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 0
2022-06-13 13:40:41.733527: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: NcclReplacePass
2022-06-13 13:40:41.733536: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 1
2022-06-13 13:40:41.733546: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 2
2022-06-13 13:40:41.733553: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 5
2022-06-13 13:40:41.733559: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: CloneConstantsForBetterClusteringPass
2022-06-13 13:40:41.733568: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 9
2022-06-13 13:40:41.733576: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ClusterScopingPass
2022-06-13 13:40:41.733583: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 10
2022-06-13 13:40:41.733590: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: MarkForCompilationPass
2022-06-13 13:40:41.733912: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: BatchMatMulV2:XLA_GPU_JIT::MatMul takes 2.173us
2022-06-13 13:40:41.733940: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: AddV2:XLA_GPU_JIT::Add takes 1.524us
2022-06-13 13:40:41.733966: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Identity:XLA_GPU_JIT::Identity takes 1.596us
2022-06-13 13:40:41.734058: I tensorflow/compiler/jit/mark_for_compilation_pass.cc:650] DeadnessAnalysis time: 16 us (cumulative: 42 us, max: 16 us, #called: 3)
2022-06-13 13:40:41.734142: I tensorflow/compiler/jit/mark_for_compilation_pass.cc:1523] MarkForCompilationPassImpl::Run time: 536 us (cumulative: 3.48 ms, max: 691 us, #called: 8)
2022-06-13 13:40:41.734162: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 12
2022-06-13 13:40:41.734169: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ForceXlaConstantsOnHostPass
2022-06-13 13:40:41.734183: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 20
2022-06-13 13:40:41.734190: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: IncreaseDynamismForAutoJitPass
2022-06-13 13:40:41.734198: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 30
2022-06-13 13:40:41.734205: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: PartiallyDeclusterPass
2022-06-13 13:40:41.734248: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 40
2022-06-13 13:40:41.734258: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ReportClusteringInfoPass
2022-06-13 13:40:41.734285: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 50
2022-06-13 13:40:41.734295: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: EncapsulateSubgraphsPass
2022-06-13 13:40:41.734303: I tensorflow/compiler/jit/encapsulate_subgraphs_pass.cc:1139] EncapsulateSubgraphsPass::Run
2022-06-13 13:40:41.734341: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_subgraphs_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.734448: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_subgraphs_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.734491: I tensorflow/compiler/jit/xla_cluster_util.cc:590] GetNodesRelatedToRefVariables() found 0 nodes
2022-06-13 13:40:41.734521: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 60
2022-06-13 13:40:41.734530: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: BuildXlaOpsPass
2022-06-13 13:40:41.734540: I tensorflow/compiler/jit/build_xla_ops_pass.cc:603] print_outputs = 0
2022-06-13 13:40:41.734547: I tensorflow/compiler/jit/build_xla_ops_pass.cc:604] check_input_numerics = 0
2022-06-13 13:40:41.734554: I tensorflow/compiler/jit/build_xla_ops_pass.cc:605] check_output_numerics = 0
2022-06-13 13:40:41.734582: W tensorflow/core/util/dump_graph.cc:134] Failed to dump build_xla_ops because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.734612: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 2
2022-06-13 13:40:41.734648: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node w}}'Will fall back to a default kernel.

2022-06-13 13:40:41.734666: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::w takes 23.788us
2022-06-13 13:40:41.734683: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node b}}'Will fall back to a default kernel.

2022-06-13 13:40:41.734696: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::b takes 14.336us
2022-06-13 13:40:41.734708: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node x}}'Will fall back to a default kernel.

2022-06-13 13:40:41.734717: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::x takes 9.883us
2022-06-13 13:40:41.734730: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: BatchMatMulV2:GPU::MatMul takes 2.828us
2022-06-13 13:40:41.734746: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: AddV2:GPU::Add takes 3.585us
2022-06-13 13:40:41.734766: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Identity:GPU::Identity takes 6.2us
2022-06-13 13:40:41.734783: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node identity_RetVal}}'Will fall back to a default kernel.

2022-06-13 13:40:41.734796: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:GPU::identity_RetVal takes 15.778us
2022-06-13 13:40:41.734835: I tensorflow/core/graph/graph_partition.cc:1251] Added send/recv: controls=0, data=0
2022-06-13 13:40:41.734911: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 3
2022-06-13 13:40:41.734923: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 1
2022-06-13 13:40:41.734930: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: MklLayoutRewritePass
2022-06-13 13:40:41.734941: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _Arg, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.734948: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _Arg, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.734955: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _Arg, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.734961: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node BatchMatMulV2, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.734968: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node AddV2, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.734974: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node Identity, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.734981: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _Retval, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.734989: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _Arg, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.734995: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _Arg, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.735002: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _Arg, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.735008: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node BatchMatMulV2, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.735014: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node AddV2, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.735021: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node Identity, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.735027: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _Retval, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.735035: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _Arg, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.735042: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _Arg, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.735048: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _Arg, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.735055: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node BatchMatMulV2, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.735061: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node AddV2, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.735068: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node Identity, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.735074: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _Retval, reason: User has assigned a device that is not CPU.
2022-06-13 13:40:41.735083: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 3
2022-06-13 13:40:41.735109: W tensorflow/core/util/dump_graph.cc:134] Failed to dump pflr_after_all_optimization_passes_562627152_/job:localhost/replica:0/task:0/device:GPU:0 because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.735200: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__inference_nn_32_17288556091578612755_0' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.735235: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1105] Start instantiating component function __inference_nn_32_17288556091578612755_0 on device /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.735397: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1114] Finished instantiating component function __inference_nn_32_17288556091578612755_0 with handle 16 status: OK
2022-06-13 13:40:41.735460: I tensorflow/core/common_runtime/eager/execute.cc:1353] Executing op __inference_nn_32 in device /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.735486: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1437] Running component function on device /job:localhost/replica:0/task:0/device:GPU:0 from __inference_nn_32 with handle 16
2022-06-13 13:40:41.735548: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: BatchMatMulV2:CPU::MatMul takes 4.991us
2022-06-13 13:40:41.735564: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: AddV2:CPU::Add takes 3.424us
2022-06-13 13:40:41.735574: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Identity:CPU::Identity takes 0.827us
2022-06-13 13:40:41.735581: I tensorflow/core/common_runtime/constant_folding.cc:613] No constant foldable nodes found
2022-06-13 13:40:41.735632: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _SOURCE}}'Will fall back to a default kernel.

2022-06-13 13:40:41.735645: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: NoOp:GPU::_SOURCE takes 13.716us
2022-06-13 13:40:41.735656: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _SINK}}'Will fall back to a default kernel.

2022-06-13 13:40:41.735664: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: NoOp:GPU::_SINK takes 7.984us
2022-06-13 13:40:41.735693: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node w}}'Will fall back to a default kernel.

2022-06-13 13:40:41.735701: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::w takes 26.554us
2022-06-13 13:40:41.735714: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node b}}'Will fall back to a default kernel.

2022-06-13 13:40:41.735722: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::b takes 9.173us
2022-06-13 13:40:41.735733: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node x}}'Will fall back to a default kernel.

2022-06-13 13:40:41.735741: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::x takes 9.408us
2022-06-13 13:40:41.735754: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: BatchMatMulV2:GPU::MatMul takes 2.721us
2022-06-13 13:40:41.735768: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: AddV2:GPU::Add takes 2.979us
2022-06-13 13:40:41.735783: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Identity:GPU::Identity takes 3.805us
2022-06-13 13:40:41.735799: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node identity_retval_RetVal}}'Will fall back to a default kernel.

2022-06-13 13:40:41.735808: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:GPU::identity_retval_RetVal takes 11.18us
2022-06-13 13:40:41.735819: I tensorflow/core/common_runtime/memory_types.cc:87] 2:0 -> 5:0: 0 -> 0
2022-06-13 13:40:41.735827: I tensorflow/core/common_runtime/memory_types.cc:87] 4:0 -> 5:1: 0 -> 0
2022-06-13 13:40:41.735834: I tensorflow/core/common_runtime/memory_types.cc:87] 5:0 -> 6:0: 0 -> 0
2022-06-13 13:40:41.735841: I tensorflow/core/common_runtime/memory_types.cc:87] 3:0 -> 6:1: 0 -> 0
2022-06-13 13:40:41.735848: I tensorflow/core/common_runtime/memory_types.cc:87] 6:0 -> 7:0: 0 -> 0
2022-06-13 13:40:41.735855: I tensorflow/core/common_runtime/memory_types.cc:87] 7:0 -> 8:0: 0 -> 0
2022-06-13 13:40:41.735864: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _SOURCE}}'Will fall back to a default kernel.

2022-06-13 13:40:41.735872: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: NoOp:GPU::_SOURCE takes 7.945us
2022-06-13 13:40:41.735881: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _SINK}}'Will fall back to a default kernel.

2022-06-13 13:40:41.735889: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: NoOp:GPU::_SINK takes 8.154us
2022-06-13 13:40:41.735900: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node w}}'Will fall back to a default kernel.

2022-06-13 13:40:41.735907: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::w takes 9.065us
2022-06-13 13:40:41.735918: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node b}}'Will fall back to a default kernel.

2022-06-13 13:40:41.735926: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::b takes 9.695us
2022-06-13 13:40:41.735937: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node x}}'Will fall back to a default kernel.

2022-06-13 13:40:41.735945: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::x takes 9.508us
2022-06-13 13:40:41.735957: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: BatchMatMulV2:GPU::MatMul takes 0.951us
2022-06-13 13:40:41.735969: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: AddV2:GPU::Add takes 1.434us
2022-06-13 13:40:41.735987: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Identity:GPU::Identity takes 1.598us
2022-06-13 13:40:41.735999: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node identity_retval_RetVal}}'Will fall back to a default kernel.

2022-06-13 13:40:41.736007: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:GPU::identity_retval_RetVal takes 9.818us
2022-06-13 13:40:41.736016: I tensorflow/core/common_runtime/memory_types.cc:87] 2:0 -> 5:0: 0 -> 0
2022-06-13 13:40:41.736024: I tensorflow/core/common_runtime/memory_types.cc:87] 4:0 -> 5:1: 0 -> 0
2022-06-13 13:40:41.736030: I tensorflow/core/common_runtime/memory_types.cc:87] 5:0 -> 6:0: 0 -> 0
2022-06-13 13:40:41.736037: I tensorflow/core/common_runtime/memory_types.cc:87] 3:0 -> 6:1: 0 -> 0
2022-06-13 13:40:41.736044: I tensorflow/core/common_runtime/memory_types.cc:87] 6:0 -> 7:0: 0 -> 0
2022-06-13 13:40:41.736051: I tensorflow/core/common_runtime/memory_types.cc:87] 7:0 -> 8:0: 0 -> 0
2022-06-13 13:40:41.736120: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node _SOURCE}} = NoOp[]()
2022-06-13 13:40:41.736131: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _SOURCE}}'Will fall back to a default kernel.

2022-06-13 13:40:41.736140: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: NoOp:GPU::_SOURCE takes 8.559us
2022-06-13 13:40:41.736148: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _SOURCE}}'Will fall back to a default kernel.

2022-06-13 13:40:41.736155: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: NoOp:GPU::_SOURCE takes 7.198us
2022-06-13 13:40:41.736167: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node _SOURCE}} = NoOp[]() takes 47.122us

2022-06-13 13:40:41.736185: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node w}} = _Arg[T=DT_FLOAT, _XlaHasReferenceVars=false, _output_shapes=[[1024,128]], _user_specified_name="W", index=0]()
2022-06-13 13:40:41.736199: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node w}}'Will fall back to a default kernel.

2022-06-13 13:40:41.736207: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::w takes 9.724us
2022-06-13 13:40:41.736216: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node w}}'Will fall back to a default kernel.

2022-06-13 13:40:41.736224: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::w takes 9.432us
2022-06-13 13:40:41.736240: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node w}} = _Arg[T=DT_FLOAT, _XlaHasReferenceVars=false, _output_shapes=[[1024,128]], _user_specified_name="W", index=0]() takes 57.002us

2022-06-13 13:40:41.736263: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node b}} = _Arg[T=DT_FLOAT, _XlaHasReferenceVars=false, _output_shapes=[[1024,128]], _user_specified_name="b", index=1]()
2022-06-13 13:40:41.736276: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node b}}'Will fall back to a default kernel.

2022-06-13 13:40:41.736285: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::b takes 10.04us
2022-06-13 13:40:41.736294: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node b}}'Will fall back to a default kernel.

2022-06-13 13:40:41.736302: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::b takes 8.877us
2022-06-13 13:40:41.736315: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node b}} = _Arg[T=DT_FLOAT, _XlaHasReferenceVars=false, _output_shapes=[[1024,128]], _user_specified_name="b", index=1]() takes 61.009us

2022-06-13 13:40:41.736327: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node x}} = _Arg[T=DT_FLOAT, _XlaHasReferenceVars=false, _output_shapes=[[16,128,128]], _user_specified_name="x", index=2]()
2022-06-13 13:40:41.736339: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node x}}'Will fall back to a default kernel.

2022-06-13 13:40:41.736347: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::x takes 9.361us
2022-06-13 13:40:41.736356: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node x}}'Will fall back to a default kernel.

2022-06-13 13:40:41.736364: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:GPU::x takes 9.342us
2022-06-13 13:40:41.736377: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node x}} = _Arg[T=DT_FLOAT, _XlaHasReferenceVars=false, _output_shapes=[[16,128,128]], _user_specified_name="x", index=2]() takes 48.513us

2022-06-13 13:40:41.736389: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node MatMul}} = BatchMatMulV2[T=DT_FLOAT, _XlaHasReferenceVars=false, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](w, x)
2022-06-13 13:40:41.736402: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: BatchMatMulV2:GPU::MatMul takes 1.281us
2022-06-13 13:40:41.736412: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: BatchMatMulV2:GPU::MatMul takes 0.908us
2022-06-13 13:40:41.736427: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node MatMul}} = BatchMatMulV2[T=DT_FLOAT, _XlaHasReferenceVars=false, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](w, x) takes 36.878us

2022-06-13 13:40:41.736444: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node Add}} = AddV2[T=DT_FLOAT, _XlaHasReferenceVars=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](MatMul, b)
2022-06-13 13:40:41.736456: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: AddV2:GPU::Add takes 1.602us
2022-06-13 13:40:41.736466: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: AddV2:GPU::Add takes 1.325us
2022-06-13 13:40:41.736478: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node Add}} = AddV2[T=DT_FLOAT, _XlaHasReferenceVars=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](MatMul, b) takes 33.284us

2022-06-13 13:40:41.736489: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node Identity}} = Identity[T=DT_FLOAT, _XlaHasReferenceVars=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Add)
2022-06-13 13:40:41.736501: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Identity:GPU::Identity takes 1.841us
2022-06-13 13:40:41.736511: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Identity:GPU::Identity takes 1.566us
2022-06-13 13:40:41.736524: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node Identity}} = Identity[T=DT_FLOAT, _XlaHasReferenceVars=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Add) takes 33.463us

2022-06-13 13:40:41.736534: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node identity_retval_RetVal}} = _Retval[T=DT_FLOAT, index=0](Identity)
2022-06-13 13:40:41.736545: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node identity_retval_RetVal}}'Will fall back to a default kernel.

2022-06-13 13:40:41.736554: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:GPU::identity_retval_RetVal takes 10.032us
2022-06-13 13:40:41.736564: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node identity_retval_RetVal}}'Will fall back to a default kernel.

2022-06-13 13:40:41.736572: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:GPU::identity_retval_RetVal takes 8.917us
2022-06-13 13:40:41.736584: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node identity_retval_RetVal}} = _Retval[T=DT_FLOAT, index=0](Identity) takes 49.302us
# run 3 schedule end

# run 3 compute start
2022-06-13 13:40:41.736648: I tensorflow/core/common_runtime/executor.cc:783] Process node: 0 step -8749612558746935679 {{node _SOURCE}} = NoOp[]() device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.736721: I tensorflow/core/common_runtime/executor.cc:783] Process node: 2 step -8749612558746935679 {{node w}} = _Arg[T=DT_FLOAT, _XlaHasReferenceVars=false, _output_shapes=[[1024,128]], _user_specified_name="W", index=0]() device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.736759: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper w op _Arg on GPU 0 stream[0]
2022-06-13 13:40:41.736787: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] w:_Arg
2022-06-13 13:40:41.736819: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled w op _Arg on GPU 0 stream[0]
2022-06-13 13:40:41.736860: I tensorflow/core/common_runtime/executor.cc:783] Process node: 3 step -8749612558746935679 {{node b}} = _Arg[T=DT_FLOAT, _XlaHasReferenceVars=false, _output_shapes=[[1024,128]], _user_specified_name="b", index=1]() device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.736888: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper b op _Arg on GPU 0 stream[0]
2022-06-13 13:40:41.736913: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] b:_Arg
2022-06-13 13:40:41.736941: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled b op _Arg on GPU 0 stream[0]
2022-06-13 13:40:41.736978: I tensorflow/core/common_runtime/executor.cc:783] Process node: 4 step -8749612558746935679 {{node x}} = _Arg[T=DT_FLOAT, _XlaHasReferenceVars=false, _output_shapes=[[16,128,128]], _user_specified_name="x", index=2]() device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.737006: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper x op _Arg on GPU 0 stream[0]
2022-06-13 13:40:41.737031: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] x:_Arg
2022-06-13 13:40:41.737058: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled x op _Arg on GPU 0 stream[0]
2022-06-13 13:40:41.737094: I tensorflow/core/common_runtime/executor.cc:783] Process node: 5 step -8749612558746935679 {{node MatMul}} = BatchMatMulV2[T=DT_FLOAT, _XlaHasReferenceVars=false, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](w, x) device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.737124: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper MatMul op BatchMatMulV2 on GPU 0 stream[0]
2022-06-13 13:40:41.737157: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] MatMul:BatchMatMulV2#shape=(float[1024,128];float[16,128,128])#
2022-06-13 13:40:41.737309: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled MatMul op BatchMatMulV2 on GPU 0 stream[0]
2022-06-13 13:40:41.737354: I tensorflow/core/common_runtime/executor.cc:783] Process node: 6 step -8749612558746935679 {{node Add}} = AddV2[T=DT_FLOAT, _XlaHasReferenceVars=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](MatMul, b) device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.737385: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper Add op AddV2 on GPU 0 stream[0]
2022-06-13 13:40:41.737416: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] Add:AddV2#shape=(float[16,1024,128];float[1024,128])#
2022-06-13 13:40:41.737480: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled Add op AddV2 on GPU 0 stream[0]
2022-06-13 13:40:41.737517: I tensorflow/core/common_runtime/executor.cc:783] Process node: 7 step -8749612558746935679 {{node Identity}} = Identity[T=DT_FLOAT, _XlaHasReferenceVars=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Add) device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.737546: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper Identity op Identity on GPU 0 stream[0]
2022-06-13 13:40:41.737576: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] Identity:Identity#shape=(float[16,1024,128])#
2022-06-13 13:40:41.737603: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled Identity op Identity on GPU 0 stream[0]
2022-06-13 13:40:41.737636: I tensorflow/core/common_runtime/executor.cc:783] Process node: 8 step -8749612558746935679 {{node identity_retval_RetVal}} = _Retval[T=DT_FLOAT, index=0](Identity) device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.737664: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper identity_retval_RetVal op _Retval on GPU 0 stream[0]
2022-06-13 13:40:41.737693: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] identity_retval_RetVal:_Retval#shape=(float[16,1024,128])#
2022-06-13 13:40:41.737720: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled identity_retval_RetVal op _Retval on GPU 0 stream[0]
# run 3 compute end

# below is output of run 3
2022-06-13 13:40:41.738117: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:57] Process op StringFormat in device
2022-06-13 13:40:41.738140: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:58] Number of return vals is 1
2022-06-13 13:40:41.738148: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:95] Execute StringFormat in device
2022-06-13 13:40:41.738182: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:GPU::StringFormat takes 2.83us
2022-06-13 13:40:41.738194: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:CPU::StringFormat takes 1.066us
2022-06-13 13:40:41.738206: I tensorflow/core/common_runtime/eager/execute.cc:923] PreferredDevice StringFormat: /job:localhost/replica:0/task:0
2022-06-13 13:40:41.738213: I tensorflow/core/common_runtime/eager/execute.cc:924] Placer place op [StringFormat] on device: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.738226: I tensorflow/core/common_runtime/eager/execute.cc:1062] Device for [StringFormat] already set to: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.738265: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__wrapped__StringFormat_T_0_device_/job:localhost/replica:0/task:0/device:CPU:0' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.738294: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__wrapped__StringFormat_T_0_device_/job:localhost/replica:0/task:0/device:CPU:0' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.738325: I tensorflow/core/common_runtime/process_function_library_runtime.cc:772] Instantiating MultiDevice function "__wrapped__StringFormat_T_0_device_/job:localhost/replica:0/task:0/device:CPU:0" on default device "/job:localhost/replica:0/task:0/device:CPU:0"
2022-06-13 13:40:41.738456: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 0
2022-06-13 13:40:41.738470: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 0
2022-06-13 13:40:41.738477: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: MlirV1CompatGraphOptimizationPass
2022-06-13 13:40:41.738485: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 9
2022-06-13 13:40:41.738491: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ControlFlowDepsToChainsPass
2022-06-13 13:40:41.738498: I tensorflow/core/common_runtime/control_flow_deps_to_chains.cc:37] ControlFlowDepsToChainsPass::Run
2022-06-13 13:40:41.738512: W tensorflow/core/util/dump_graph.cc:134] Failed to dump control_flow_deps_to_chains_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.738533: W tensorflow/core/util/dump_graph.cc:134] Failed to dump control_flow_deps_to_chains_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.738548: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 10
2022-06-13 13:40:41.738555: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: AccumulateNV2RemovePass
2022-06-13 13:40:41.738563: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: LowerFunctionalOpsPass
2022-06-13 13:40:41.738574: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ParallelConcatRemovePass
2022-06-13 13:40:41.738581: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 35
2022-06-13 13:40:41.738588: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: IsolatePlacerInspectionRequiredOpsPass
2022-06-13 13:40:41.738595: I tensorflow/core/common_runtime/isolate_placer_inspection_required_ops_pass.cc:34] IsolatePlacerInspectionRequiredOpsPass::Run
2022-06-13 13:40:41.738604: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: IntroduceFloatingPointJitterPass
2022-06-13 13:40:41.738612: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 36
2022-06-13 13:40:41.738619: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: EncapsulateXlaComputationsPass
2022-06-13 13:40:41.738631: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_xla_computations_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.738642: I tensorflow/compiler/jit/encapsulate_xla_computations_pass.cc:353] EncapsulateXlaComputations(): (failed to create writable file: INVALID_ARGUMENT: TF_DUMP_GRAPH_PREFIX not specified)
2022-06-13 13:40:41.738682: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_xla_computations_halfway because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.738697: I tensorflow/compiler/jit/encapsulate_xla_computations_pass.cc:364] EncapsulateXlaComputations() half-way: (failed to create writable file: INVALID_ARGUMENT: TF_DUMP_GRAPH_PREFIX not specified)
2022-06-13 13:40:41.738710: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_xla_computations_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.738721: I tensorflow/compiler/jit/encapsulate_xla_computations_pass.cc:370] EncapsulateXlaComputations() finished: (failed to create writable file: INVALID_ARGUMENT: TF_DUMP_GRAPH_PREFIX not specified)
2022-06-13 13:40:41.738730: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 37
2022-06-13 13:40:41.738736: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: FunctionalizeControlFlowForXlaPass
2022-06-13 13:40:41.738753: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 99999
2022-06-13 13:40:41.738760: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: WeakForwardTypeInferencePass
2022-06-13 13:40:41.738768: I tensorflow/core/common_runtime/forward_type_inference.cc:130] ForwardTypeInferencePass::Run
2022-06-13 13:40:41.738779: W tensorflow/core/util/dump_graph.cc:134] Failed to dump forward_type_inference_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.738799: I tensorflow/core/common_runtime/forward_type_inference.cc:311] Finished after 1 iterations; done 4 of 4 nodes in 4 visits
2022-06-13 13:40:41.738814: W tensorflow/core/util/dump_graph.cc:134] Failed to dump forward_type_inference_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.738827: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 0
2022-06-13 13:40:41.738849: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:GPU::StringFormat takes 2.105us
2022-06-13 13:40:41.738860: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:CPU::StringFormat takes 0.797us
2022-06-13 13:40:41.738875: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node output_RetVal}}'Will fall back to a default kernel.

2022-06-13 13:40:41.738884: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:GPU::output_RetVal takes 11.916us
2022-06-13 13:40:41.738893: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:CPU::output_RetVal takes 0.656us
2022-06-13 13:40:41.738908: I tensorflow/core/common_runtime/placer.cc:124] output_RetVal(_Retval) placed on: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.738922: I tensorflow/core/common_runtime/placer.cc:124] StringFormat(StringFormat) placed on: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.738931: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 1
2022-06-13 13:40:41.738938: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 0
2022-06-13 13:40:41.738945: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: NcclReplacePass
2022-06-13 13:40:41.738953: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 1
2022-06-13 13:40:41.738960: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 2
2022-06-13 13:40:41.738968: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 5
2022-06-13 13:40:41.738974: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: CloneConstantsForBetterClusteringPass
2022-06-13 13:40:41.738983: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 9
2022-06-13 13:40:41.738990: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ClusterScopingPass
2022-06-13 13:40:41.738997: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 10
2022-06-13 13:40:41.739004: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: MarkForCompilationPass
2022-06-13 13:40:41.739284: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:XLA_CPU_JIT::StringFormat takes 1.601us
2022-06-13 13:40:41.739348: I tensorflow/compiler/jit/mark_for_compilation_pass.cc:1523] MarkForCompilationPassImpl::Run time: 328 us (cumulative: 3.8 ms, max: 691 us, #called: 9)
2022-06-13 13:40:41.739360: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 12
2022-06-13 13:40:41.739367: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ForceXlaConstantsOnHostPass
2022-06-13 13:40:41.739380: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 20
2022-06-13 13:40:41.739387: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: IncreaseDynamismForAutoJitPass
2022-06-13 13:40:41.739395: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 30
2022-06-13 13:40:41.739401: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: PartiallyDeclusterPass
2022-06-13 13:40:41.739426: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 40
2022-06-13 13:40:41.739435: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ReportClusteringInfoPass
2022-06-13 13:40:41.739457: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 50
2022-06-13 13:40:41.739466: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: EncapsulateSubgraphsPass
2022-06-13 13:40:41.739473: I tensorflow/compiler/jit/encapsulate_subgraphs_pass.cc:1139] EncapsulateSubgraphsPass::Run
2022-06-13 13:40:41.739491: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_subgraphs_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.739556: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_subgraphs_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.739584: I tensorflow/compiler/jit/xla_cluster_util.cc:590] GetNodesRelatedToRefVariables() found 0 nodes
2022-06-13 13:40:41.739609: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 60
2022-06-13 13:40:41.739620: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: BuildXlaOpsPass
2022-06-13 13:40:41.739629: I tensorflow/compiler/jit/build_xla_ops_pass.cc:603] print_outputs = 0
2022-06-13 13:40:41.739636: I tensorflow/compiler/jit/build_xla_ops_pass.cc:604] check_input_numerics = 0
2022-06-13 13:40:41.739642: I tensorflow/compiler/jit/build_xla_ops_pass.cc:605] check_output_numerics = 0
2022-06-13 13:40:41.739667: W tensorflow/core/util/dump_graph.cc:134] Failed to dump build_xla_ops because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.739681: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 2
2022-06-13 13:40:41.739696: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:CPU::StringFormat takes 1.102us
2022-06-13 13:40:41.739711: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:CPU::output_RetVal takes 0.559us
2022-06-13 13:40:41.739728: I tensorflow/core/graph/graph_partition.cc:1251] Added send/recv: controls=0, data=0
2022-06-13 13:40:41.739768: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 3
2022-06-13 13:40:41.739776: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 1
2022-06-13 13:40:41.739781: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: MklLayoutRewritePass
2022-06-13 13:40:41.739802: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 3
2022-06-13 13:40:41.739815: W tensorflow/core/util/dump_graph.cc:134] Failed to dump pflr_after_all_optimization_passes_562649184_/job:localhost/replica:0/task:0/device:CPU:0 because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.739851: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__wrapped__StringFormat_T_0_device_/job:localhost/replica:0/task:0/device:CPU:0_7245493243845350481_0' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.739868: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1105] Start instantiating component function __wrapped__StringFormat_T_0_device_/job:localhost/replica:0/task:0/device:CPU:0_7245493243845350481_0 on device /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.739930: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1114] Finished instantiating component function __wrapped__StringFormat_T_0_device_/job:localhost/replica:0/task:0/device:CPU:0_7245493243845350481_0 with handle 18 status: OK
2022-06-13 13:40:41.739964: I tensorflow/core/common_runtime/eager/execute.cc:1353] Executing op StringFormat in device /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.739979: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1302] Running component function on device /job:localhost/replica:0/task:0/device:CPU:0 from __wrapped__StringFormat_T_0_device_/job:localhost/replica:0/task:0/device:CPU:0 with handle 18
2022-06-13 13:40:41.740007: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:CPU::StringFormat takes 1.171us
2022-06-13 13:40:41.740023: I tensorflow/core/common_runtime/constant_folding.cc:631] Constant foldable 3 : 4
2022-06-13 13:40:41.740083: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node _SOURCE}} = NoOp[]()
2022-06-13 13:40:41.740094: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _SOURCE}}'Will fall back to a default kernel.

2022-06-13 13:40:41.740101: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: NoOp:CPU::_SOURCE takes 7.149us
2022-06-13 13:40:41.740107: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _SOURCE}}'Will fall back to a default kernel.

2022-06-13 13:40:41.740113: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: NoOp:CPU::_SOURCE takes 5.558us
2022-06-13 13:40:41.740123: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node _SOURCE}} = NoOp[]() takes 38.715us

2022-06-13 13:40:41.740135: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node StringFormat}} = StringFormat[T=[], _XlaHasReferenceVars=false, placeholder="{}", summarize=3, template="run 3 costs 15.825033187866211", _device="/job:localhost/replica:0/task:0/device:CPU:0"]()
2022-06-13 13:40:41.740144: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:CPU::StringFormat takes 0.444us
2022-06-13 13:40:41.740150: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:CPU::StringFormat takes 0.34us
2022-06-13 13:40:41.740164: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node StringFormat}} = StringFormat[T=[], _XlaHasReferenceVars=false, placeholder="{}", summarize=3, template="run 3 costs 15.825033187866211", _device="/job:localhost/replica:0/task:0/device:CPU:0"]() takes 30.203us

2022-06-13 13:40:41.740175: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node _send_StringFormat_0}} = _Send[T=DT_STRING, client_terminated=true, recv_device="/device:CPU:0", send_device="/device:CPU:0", send_device_incarnation=-7433993840239241574, tensor_name="StringFormat:0"](StringFormat)
2022-06-13 13:40:41.740185: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Send:CPU::_send_StringFormat_0 takes 1.591us
2022-06-13 13:40:41.740192: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Send:CPU::_send_StringFormat_0 takes 0.316us
2022-06-13 13:40:41.740210: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node _send_StringFormat_0}} = _Send[T=DT_STRING, client_terminated=true, recv_device="/device:CPU:0", send_device="/device:CPU:0", send_device_incarnation=-7433993840239241574, tensor_name="StringFormat:0"](StringFormat) takes 35.212us

2022-06-13 13:40:41.740228: I tensorflow/core/common_runtime/executor.cc:783] Process node: 0 step -1 {{node _SOURCE}} = NoOp[]() device: /device:CPU:0
2022-06-13 13:40:41.740237: I tensorflow/core/common_runtime/executor.cc:783] Process node: 2 step -1 {{node StringFormat}} = StringFormat[T=[], _XlaHasReferenceVars=false, placeholder="{}", summarize=3, template="run 3 costs 15.825033187866211", _device="/job:localhost/replica:0/task:0/device:CPU:0"]() device: /device:CPU:0
2022-06-13 13:40:41.740250: I tensorflow/core/common_runtime/executor.cc:783] Process node: 3 step -1 {{node _send_StringFormat_0}} = _Send[T=DT_STRING, client_terminated=true, recv_device="/device:CPU:0", send_device="/device:CPU:0", send_device_incarnation=-7433993840239241574, tensor_name="StringFormat:0"](StringFormat) device: /device:CPU:0
2022-06-13 13:40:41.740301: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Const:CPU::StringFormat/_0__cf__0 takes 0.88us
2022-06-13 13:40:41.740310: I tensorflow/core/common_runtime/constant_folding.cc:562] Replacing StringFormat :: 0 with a constant
2022-06-13 13:40:41.740349: I tensorflow/core/common_runtime/constant_folding.cc:613] No constant foldable nodes found
2022-06-13 13:40:41.740389: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node StringFormat/_0__cf__0}} = Const[dtype=DT_STRING, value=Tensor<type: string shape: [] values: run 3 costs 15.825033187866211>, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()
2022-06-13 13:40:41.740402: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Const:CPU::StringFormat/_0__cf__0 takes 0.642us
2022-06-13 13:40:41.740409: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Const:CPU::StringFormat/_0__cf__0 takes 0.3us
2022-06-13 13:40:41.740426: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node StringFormat/_0__cf__0}} = Const[dtype=DT_STRING, value=Tensor<type: string shape: [] values: run 3 costs 15.825033187866211>, _device="/job:localhost/replica:0/task:0/device:CPU:0"]() takes 40.095us

2022-06-13 13:40:41.740437: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node output_retval_RetVal}} = _Retval[T=DT_STRING, index=0](StringFormat)
2022-06-13 13:40:41.740445: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:CPU::output_retval_RetVal takes 0.466us
2022-06-13 13:40:41.740451: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:CPU::output_retval_RetVal takes 0.289us
2022-06-13 13:40:41.740461: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node output_retval_RetVal}} = _Retval[T=DT_STRING, index=0](StringFormat) takes 24.051us

2022-06-13 13:40:41.740508: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:57] Process op PrintV2 in device
2022-06-13 13:40:41.740518: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:58] Number of return vals is 0
2022-06-13 13:40:41.740523: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:95] Execute PrintV2 in device
2022-06-13 13:40:41.740533: I tensorflow/core/common_runtime/eager/execute.cc:982] PrintV2:input:0 /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.740548: I tensorflow/core/common_runtime/eager/execute.cc:1353] Executing op PrintV2 in device /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.740561: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1302] Running component function on device /job:localhost/replica:0/task:0/device:CPU:0 from __wrapped__PrintV2_device_/job:localhost/replica:0/task:0/device:CPU:0 with handle 8
# before is output of run 3

run 3 costs 15.825033187866211

# run 4 schedule start
2022-06-13 13:40:41.740898: I tensorflow/python/eager/pywrap_tfe_src.cc:885] Eager executes cancelable __inference_nn_32 on  the number of inputs is 3 the number of output is 1
2022-06-13 13:40:41.740923: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__inference_nn_32' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.740943: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__inference_nn_32' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.740955: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__inference_nn_32' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.740963: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:57] Process op __inference_nn_32 in device
2022-06-13 13:40:41.740968: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:58] Number of return vals is 1
2022-06-13 13:40:41.740974: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:95] Execute __inference_nn_32 in device
2022-06-13 13:40:41.740982: I tensorflow/core/common_runtime/eager/execute.cc:982] __inference_nn_32:input:0 /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.740989: I tensorflow/core/common_runtime/eager/execute.cc:982] __inference_nn_32:input:1 /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.740996: I tensorflow/core/common_runtime/eager/execute.cc:982] __inference_nn_32:input:2 /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.741010: I tensorflow/core/common_runtime/eager/execute.cc:1353] Executing op __inference_nn_32 in device /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.741026: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1437] Running component function on device /job:localhost/replica:0/task:0/device:GPU:0 from __inference_nn_32 with handle 16
# run 4 schedule end
# run 4 compute start
2022-06-13 13:40:41.741080: I tensorflow/core/common_runtime/executor.cc:783] Process node: 0 step -776110010382447230 {{node _SOURCE}} = NoOp[]() device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.741143: I tensorflow/core/common_runtime/executor.cc:783] Process node: 2 step -776110010382447230 {{node w}} = _Arg[T=DT_FLOAT, _XlaHasReferenceVars=false, _output_shapes=[[1024,128]], _user_specified_name="W", index=0]() device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.741170: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper w op _Arg on GPU 0 stream[0]
2022-06-13 13:40:41.741189: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] w:_Arg
2022-06-13 13:40:41.741216: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled w op _Arg on GPU 0 stream[0]
2022-06-13 13:40:41.741245: I tensorflow/core/common_runtime/executor.cc:783] Process node: 3 step -776110010382447230 {{node b}} = _Arg[T=DT_FLOAT, _XlaHasReferenceVars=false, _output_shapes=[[1024,128]], _user_specified_name="b", index=1]() device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.741264: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper b op _Arg on GPU 0 stream[0]
2022-06-13 13:40:41.741277: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] b:_Arg
2022-06-13 13:40:41.741295: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled b op _Arg on GPU 0 stream[0]
2022-06-13 13:40:41.741320: I tensorflow/core/common_runtime/executor.cc:783] Process node: 4 step -776110010382447230 {{node x}} = _Arg[T=DT_FLOAT, _XlaHasReferenceVars=false, _output_shapes=[[16,128,128]], _user_specified_name="x", index=2]() device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.741339: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper x op _Arg on GPU 0 stream[0]
2022-06-13 13:40:41.741351: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] x:_Arg
2022-06-13 13:40:41.741371: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled x op _Arg on GPU 0 stream[0]
2022-06-13 13:40:41.741398: I tensorflow/core/common_runtime/executor.cc:783] Process node: 5 step -776110010382447230 {{node MatMul}} = BatchMatMulV2[T=DT_FLOAT, _XlaHasReferenceVars=false, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](w, x) device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.741417: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper MatMul op BatchMatMulV2 on GPU 0 stream[0]
2022-06-13 13:40:41.741435: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] MatMul:BatchMatMulV2#shape=(float[1024,128];float[16,128,128])#
2022-06-13 13:40:41.741544: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled MatMul op BatchMatMulV2 on GPU 0 stream[0]
2022-06-13 13:40:41.741576: I tensorflow/core/common_runtime/executor.cc:783] Process node: 6 step -776110010382447230 {{node Add}} = AddV2[T=DT_FLOAT, _XlaHasReferenceVars=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](MatMul, b) device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.741597: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper Add op AddV2 on GPU 0 stream[0]
2022-06-13 13:40:41.741614: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] Add:AddV2#shape=(float[16,1024,128];float[1024,128])#
2022-06-13 13:40:41.741674: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled Add op AddV2 on GPU 0 stream[0]
2022-06-13 13:40:41.741712: I tensorflow/core/common_runtime/executor.cc:783] Process node: 7 step -776110010382447230 {{node Identity}} = Identity[T=DT_FLOAT, _XlaHasReferenceVars=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Add) device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.741731: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper Identity op Identity on GPU 0 stream[0]
2022-06-13 13:40:41.741748: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] Identity:Identity#shape=(float[16,1024,128])#
2022-06-13 13:40:41.741768: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled Identity op Identity on GPU 0 stream[0]
2022-06-13 13:40:41.741792: I tensorflow/core/common_runtime/executor.cc:783] Process node: 8 step -776110010382447230 {{node identity_retval_RetVal}} = _Retval[T=DT_FLOAT, index=0](Identity) device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-13 13:40:41.741811: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper identity_retval_RetVal op _Retval on GPU 0 stream[0]
2022-06-13 13:40:41.741826: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] identity_retval_RetVal:_Retval#shape=(float[16,1024,128])#
2022-06-13 13:40:41.741844: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled identity_retval_RetVal op _Retval on GPU 0 stream[0]
# run 4 compute end
# below is output of run 4
2022-06-13 13:40:41.742103: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:57] Process op StringFormat in device
2022-06-13 13:40:41.742120: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:58] Number of return vals is 1
2022-06-13 13:40:41.742125: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:95] Execute StringFormat in device
2022-06-13 13:40:41.742152: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:GPU::StringFormat takes 2.428us
2022-06-13 13:40:41.742162: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:CPU::StringFormat takes 0.659us
2022-06-13 13:40:41.742171: I tensorflow/core/common_runtime/eager/execute.cc:923] PreferredDevice StringFormat: /job:localhost/replica:0/task:0
2022-06-13 13:40:41.742176: I tensorflow/core/common_runtime/eager/execute.cc:924] Placer place op [StringFormat] on device: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.742186: I tensorflow/core/common_runtime/eager/execute.cc:1062] Device for [StringFormat] already set to: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.742209: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__wrapped__StringFormat_T_0_device_/job:localhost/replica:0/task:0/device:CPU:0' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.742230: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__wrapped__StringFormat_T_0_device_/job:localhost/replica:0/task:0/device:CPU:0' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.742252: I tensorflow/core/common_runtime/process_function_library_runtime.cc:772] Instantiating MultiDevice function "__wrapped__StringFormat_T_0_device_/job:localhost/replica:0/task:0/device:CPU:0" on default device "/job:localhost/replica:0/task:0/device:CPU:0"
2022-06-13 13:40:41.742348: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 0
2022-06-13 13:40:41.742359: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 0
2022-06-13 13:40:41.742364: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: MlirV1CompatGraphOptimizationPass
2022-06-13 13:40:41.742369: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 9
2022-06-13 13:40:41.742374: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ControlFlowDepsToChainsPass
2022-06-13 13:40:41.742379: I tensorflow/core/common_runtime/control_flow_deps_to_chains.cc:37] ControlFlowDepsToChainsPass::Run
2022-06-13 13:40:41.742389: W tensorflow/core/util/dump_graph.cc:134] Failed to dump control_flow_deps_to_chains_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.742403: W tensorflow/core/util/dump_graph.cc:134] Failed to dump control_flow_deps_to_chains_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.742412: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 10
2022-06-13 13:40:41.742417: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: AccumulateNV2RemovePass
2022-06-13 13:40:41.742423: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: LowerFunctionalOpsPass
2022-06-13 13:40:41.742431: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ParallelConcatRemovePass
2022-06-13 13:40:41.742436: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 35
2022-06-13 13:40:41.742441: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: IsolatePlacerInspectionRequiredOpsPass
2022-06-13 13:40:41.742446: I tensorflow/core/common_runtime/isolate_placer_inspection_required_ops_pass.cc:34] IsolatePlacerInspectionRequiredOpsPass::Run
2022-06-13 13:40:41.742452: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: IntroduceFloatingPointJitterPass
2022-06-13 13:40:41.742458: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 36
2022-06-13 13:40:41.742463: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: EncapsulateXlaComputationsPass
2022-06-13 13:40:41.742471: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_xla_computations_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.742479: I tensorflow/compiler/jit/encapsulate_xla_computations_pass.cc:353] EncapsulateXlaComputations(): (failed to create writable file: INVALID_ARGUMENT: TF_DUMP_GRAPH_PREFIX not specified)
2022-06-13 13:40:41.742508: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_xla_computations_halfway because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.742519: I tensorflow/compiler/jit/encapsulate_xla_computations_pass.cc:364] EncapsulateXlaComputations() half-way: (failed to create writable file: INVALID_ARGUMENT: TF_DUMP_GRAPH_PREFIX not specified)
2022-06-13 13:40:41.742528: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_xla_computations_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.742536: I tensorflow/compiler/jit/encapsulate_xla_computations_pass.cc:370] EncapsulateXlaComputations() finished: (failed to create writable file: INVALID_ARGUMENT: TF_DUMP_GRAPH_PREFIX not specified)
2022-06-13 13:40:41.742542: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 37
2022-06-13 13:40:41.742547: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: FunctionalizeControlFlowForXlaPass
2022-06-13 13:40:41.742559: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 99999
2022-06-13 13:40:41.742564: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: WeakForwardTypeInferencePass
2022-06-13 13:40:41.742569: I tensorflow/core/common_runtime/forward_type_inference.cc:130] ForwardTypeInferencePass::Run
2022-06-13 13:40:41.742577: W tensorflow/core/util/dump_graph.cc:134] Failed to dump forward_type_inference_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.742591: I tensorflow/core/common_runtime/forward_type_inference.cc:311] Finished after 1 iterations; done 4 of 4 nodes in 4 visits
2022-06-13 13:40:41.742599: W tensorflow/core/util/dump_graph.cc:134] Failed to dump forward_type_inference_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.742609: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 0
2022-06-13 13:40:41.742624: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:GPU::StringFormat takes 1.533us
2022-06-13 13:40:41.742631: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:CPU::StringFormat takes 0.582us
2022-06-13 13:40:41.742642: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node output_RetVal}}'Will fall back to a default kernel.

2022-06-13 13:40:41.742648: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:GPU::output_RetVal takes 8.704us
2022-06-13 13:40:41.742654: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:CPU::output_RetVal takes 0.433us
2022-06-13 13:40:41.742665: I tensorflow/core/common_runtime/placer.cc:124] output_RetVal(_Retval) placed on: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.742673: I tensorflow/core/common_runtime/placer.cc:124] StringFormat(StringFormat) placed on: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.742679: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 1
2022-06-13 13:40:41.742684: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 0
2022-06-13 13:40:41.742689: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: NcclReplacePass
2022-06-13 13:40:41.742695: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 1
2022-06-13 13:40:41.742701: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 2
2022-06-13 13:40:41.742706: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 5
2022-06-13 13:40:41.742711: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: CloneConstantsForBetterClusteringPass
2022-06-13 13:40:41.742716: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 9
2022-06-13 13:40:41.742721: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ClusterScopingPass
2022-06-13 13:40:41.742726: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 10
2022-06-13 13:40:41.742730: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: MarkForCompilationPass
2022-06-13 13:40:41.742931: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:XLA_CPU_JIT::StringFormat takes 1.089us
2022-06-13 13:40:41.742978: I tensorflow/compiler/jit/mark_for_compilation_pass.cc:1523] MarkForCompilationPassImpl::Run time: 237 us (cumulative: 4.04 ms, max: 691 us, #called: 10)
2022-06-13 13:40:41.742988: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 12
2022-06-13 13:40:41.742993: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ForceXlaConstantsOnHostPass
2022-06-13 13:40:41.743001: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 20
2022-06-13 13:40:41.743006: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: IncreaseDynamismForAutoJitPass
2022-06-13 13:40:41.743012: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 30
2022-06-13 13:40:41.743017: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: PartiallyDeclusterPass
2022-06-13 13:40:41.743035: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 40
2022-06-13 13:40:41.743043: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ReportClusteringInfoPass
2022-06-13 13:40:41.743060: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 50
2022-06-13 13:40:41.743064: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: EncapsulateSubgraphsPass
2022-06-13 13:40:41.743070: I tensorflow/compiler/jit/encapsulate_subgraphs_pass.cc:1139] EncapsulateSubgraphsPass::Run
2022-06-13 13:40:41.743082: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_subgraphs_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.743131: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_subgraphs_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.743151: I tensorflow/compiler/jit/xla_cluster_util.cc:590] GetNodesRelatedToRefVariables() found 0 nodes
2022-06-13 13:40:41.743166: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 60
2022-06-13 13:40:41.743173: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: BuildXlaOpsPass
2022-06-13 13:40:41.743180: I tensorflow/compiler/jit/build_xla_ops_pass.cc:603] print_outputs = 0
2022-06-13 13:40:41.743184: I tensorflow/compiler/jit/build_xla_ops_pass.cc:604] check_input_numerics = 0
2022-06-13 13:40:41.743189: I tensorflow/compiler/jit/build_xla_ops_pass.cc:605] check_output_numerics = 0
2022-06-13 13:40:41.743199: W tensorflow/core/util/dump_graph.cc:134] Failed to dump build_xla_ops because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.743210: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 2
2022-06-13 13:40:41.743224: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:CPU::StringFormat takes 1.077us
2022-06-13 13:40:41.743235: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:CPU::output_RetVal takes 0.447us
2022-06-13 13:40:41.743253: I tensorflow/core/graph/graph_partition.cc:1251] Added send/recv: controls=0, data=0
2022-06-13 13:40:41.743293: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 3
2022-06-13 13:40:41.743302: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 1
2022-06-13 13:40:41.743306: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: MklLayoutRewritePass
2022-06-13 13:40:41.743325: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 3
2022-06-13 13:40:41.743339: W tensorflow/core/util/dump_graph.cc:134] Failed to dump pflr_after_all_optimization_passes_562728544_/job:localhost/replica:0/task:0/device:CPU:0 because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-13 13:40:41.743375: I tensorflow/core/framework/op.cc:80] NOT_FOUND: Op type not registered '__wrapped__StringFormat_T_0_device_/job:localhost/replica:0/task:0/device:CPU:0_5078087082401372147_0' in binary running on 90e62df95daa. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-06-13 13:40:41.743392: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1105] Start instantiating component function __wrapped__StringFormat_T_0_device_/job:localhost/replica:0/task:0/device:CPU:0_5078087082401372147_0 on device /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.743452: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1114] Finished instantiating component function __wrapped__StringFormat_T_0_device_/job:localhost/replica:0/task:0/device:CPU:0_5078087082401372147_0 with handle 20 status: OK
2022-06-13 13:40:41.743487: I tensorflow/core/common_runtime/eager/execute.cc:1353] Executing op StringFormat in device /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.743502: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1302] Running component function on device /job:localhost/replica:0/task:0/device:CPU:0 from __wrapped__StringFormat_T_0_device_/job:localhost/replica:0/task:0/device:CPU:0 with handle 20
2022-06-13 13:40:41.743529: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:CPU::StringFormat takes 1.186us
2022-06-13 13:40:41.743545: I tensorflow/core/common_runtime/constant_folding.cc:631] Constant foldable 3 : 4
2022-06-13 13:40:41.743604: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node _SOURCE}} = NoOp[]()
2022-06-13 13:40:41.743616: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _SOURCE}}'Will fall back to a default kernel.

2022-06-13 13:40:41.743623: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: NoOp:CPU::_SOURCE takes 7.611us
2022-06-13 13:40:41.743629: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _SOURCE}}'Will fall back to a default kernel.

2022-06-13 13:40:41.743634: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: NoOp:CPU::_SOURCE takes 5.455us
2022-06-13 13:40:41.743644: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node _SOURCE}} = NoOp[]() takes 38.971us

2022-06-13 13:40:41.743656: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node StringFormat}} = StringFormat[T=[], _XlaHasReferenceVars=false, placeholder="{}", summarize=3, template="run 4 costs 1.3682842254638672", _device="/job:localhost/replica:0/task:0/device:CPU:0"]()
2022-06-13 13:40:41.743666: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:CPU::StringFormat takes 0.509us
2022-06-13 13:40:41.743673: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: StringFormat:CPU::StringFormat takes 0.348us
2022-06-13 13:40:41.743686: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node StringFormat}} = StringFormat[T=[], _XlaHasReferenceVars=false, placeholder="{}", summarize=3, template="run 4 costs 1.3682842254638672", _device="/job:localhost/replica:0/task:0/device:CPU:0"]() takes 30.797us

2022-06-13 13:40:41.743697: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node _send_StringFormat_0}} = _Send[T=DT_STRING, client_terminated=true, recv_device="/device:CPU:0", send_device="/device:CPU:0", send_device_incarnation=6617487762276893854, tensor_name="StringFormat:0"](StringFormat)
2022-06-13 13:40:41.743707: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Send:CPU::_send_StringFormat_0 takes 0.559us
2022-06-13 13:40:41.743713: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Send:CPU::_send_StringFormat_0 takes 0.316us
2022-06-13 13:40:41.743731: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node _send_StringFormat_0}} = _Send[T=DT_STRING, client_terminated=true, recv_device="/device:CPU:0", send_device="/device:CPU:0", send_device_incarnation=6617487762276893854, tensor_name="StringFormat:0"](StringFormat) takes 33.315us

2022-06-13 13:40:41.743749: I tensorflow/core/common_runtime/executor.cc:783] Process node: 0 step -1 {{node _SOURCE}} = NoOp[]() device: /device:CPU:0
2022-06-13 13:40:41.743758: I tensorflow/core/common_runtime/executor.cc:783] Process node: 2 step -1 {{node StringFormat}} = StringFormat[T=[], _XlaHasReferenceVars=false, placeholder="{}", summarize=3, template="run 4 costs 1.3682842254638672", _device="/job:localhost/replica:0/task:0/device:CPU:0"]() device: /device:CPU:0
2022-06-13 13:40:41.743771: I tensorflow/core/common_runtime/executor.cc:783] Process node: 3 step -1 {{node _send_StringFormat_0}} = _Send[T=DT_STRING, client_terminated=true, recv_device="/device:CPU:0", send_device="/device:CPU:0", send_device_incarnation=6617487762276893854, tensor_name="StringFormat:0"](StringFormat) device: /device:CPU:0
2022-06-13 13:40:41.743808: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Const:CPU::StringFormat/_0__cf__0 takes 0.745us
2022-06-13 13:40:41.743817: I tensorflow/core/common_runtime/constant_folding.cc:562] Replacing StringFormat :: 0 with a constant
2022-06-13 13:40:41.743856: I tensorflow/core/common_runtime/constant_folding.cc:613] No constant foldable nodes found
2022-06-13 13:40:41.743893: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node StringFormat/_0__cf__0}} = Const[dtype=DT_STRING, value=Tensor<type: string shape: [] values: run 4 costs 1.3682842254638672>, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()
2022-06-13 13:40:41.743905: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Const:CPU::StringFormat/_0__cf__0 takes 0.502us
2022-06-13 13:40:41.743912: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Const:CPU::StringFormat/_0__cf__0 takes 0.31us
2022-06-13 13:40:41.743930: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node StringFormat/_0__cf__0}} = Const[dtype=DT_STRING, value=Tensor<type: string shape: [] values: run 4 costs 1.3682842254638672>, _device="/job:localhost/replica:0/task:0/device:CPU:0"]() takes 38.841us

2022-06-13 13:40:41.743940: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node output_retval_RetVal}} = _Retval[T=DT_STRING, index=0](StringFormat)
2022-06-13 13:40:41.743948: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:CPU::output_retval_RetVal takes 0.435us
2022-06-13 13:40:41.743955: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:CPU::output_retval_RetVal takes 0.31us
2022-06-13 13:40:41.743965: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node output_retval_RetVal}} = _Retval[T=DT_STRING, index=0](StringFormat) takes 24.343us

2022-06-13 13:40:41.744009: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:57] Process op PrintV2 in device
2022-06-13 13:40:41.744018: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:58] Number of return vals is 0
2022-06-13 13:40:41.744023: I tensorflow/core/common_runtime/eager/custom_device_op_handler.cc:95] Execute PrintV2 in device
2022-06-13 13:40:41.744033: I tensorflow/core/common_runtime/eager/execute.cc:982] PrintV2:input:0 /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.744048: I tensorflow/core/common_runtime/eager/execute.cc:1353] Executing op PrintV2 in device /job:localhost/replica:0/task:0/device:CPU:0
2022-06-13 13:40:41.744062: I tensorflow/core/common_runtime/process_function_library_runtime.cc:1302] Running component function on device /job:localhost/replica:0/task:0/device:CPU:0 from __wrapped__PrintV2_device_/job:localhost/replica:0/task:0/device:CPU:0 with handle 8
# before is output of run 4
run 4 costs 1.3682842254638672

## lazy_execution.log
2022-06-16 02:29:11.215225: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-16 02:29:11.372311: I tensorflow/core/platform/cloud/gcs_file_system.cc:806] GCS cache max size = 0 ; block size = 67108864 ; max staleness = 0
2022-06-16 02:29:11.372372: I ./tensorflow/core/platform/cloud/ram_file_block_cache.h:64] GCS file block cache is disabled
2022-06-16 02:29:11.372388: I tensorflow/core/platform/cloud/gcs_file_system.cc:846] GCS DNS cache is disabled, because GCS_RESOLVE_REFRESH_SECS = 0 (or is not set)
2022-06-16 02:29:11.372393: I tensorflow/core/platform/cloud/gcs_file_system.cc:876] GCS additional header DISABLED. No environment variable set.
2022-06-16 02:29:11.373307: I tensorflow/core/util/util.cc:168] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-06-16 02:29:11.378043: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2022-06-16 02:29:11.417732: I tensorflow/core/platform/cloud/gcs_file_system.cc:806] GCS cache max size = 0 ; block size = 67108864 ; max staleness = 0
2022-06-16 02:29:11.417772: I ./tensorflow/core/platform/cloud/ram_file_block_cache.h:64] GCS file block cache is disabled
2022-06-16 02:29:11.417778: I tensorflow/core/platform/cloud/gcs_file_system.cc:846] GCS DNS cache is disabled, because GCS_RESOLVE_REFRESH_SECS = 0 (or is not set)
2022-06-16 02:29:11.417799: I tensorflow/core/platform/cloud/gcs_file_system.cc:876] GCS additional header DISABLED. No environment variable set.
2022-06-16 02:29:12.083804: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libnvinfer.so.7
2022-06-16 02:29:12.907647: I ./tensorflow/core/common_runtime/mkl_cpu_allocator.h:178] MklCPUAllocator: Setting max_mem_bytes: 134837268480
2022-06-16 02:29:12.907765: I tensorflow/core/common_runtime/bfc_allocator.cc:70] Creating new BFCAllocator named: mklcpu
2022-06-16 02:29:12.907802: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 256B
2022-06-16 02:29:12.907834: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 512B
2022-06-16 02:29:12.907872: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 1.0KiB
2022-06-16 02:29:12.907904: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 2.0KiB
2022-06-16 02:29:12.907935: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 4.0KiB
2022-06-16 02:29:12.907968: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 8.0KiB
2022-06-16 02:29:12.907998: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 16.0KiB
2022-06-16 02:29:12.908025: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 32.0KiB
2022-06-16 02:29:12.908052: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 64.0KiB
2022-06-16 02:29:12.908122: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 128.0KiB
2022-06-16 02:29:12.908156: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 256.0KiB
2022-06-16 02:29:12.908257: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 512.0KiB
2022-06-16 02:29:12.908283: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 1.00MiB
2022-06-16 02:29:12.908310: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 2.00MiB
2022-06-16 02:29:12.908338: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 4.00MiB
2022-06-16 02:29:12.908392: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 8.00MiB
2022-06-16 02:29:12.908434: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 16.00MiB
2022-06-16 02:29:12.908467: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 32.00MiB
2022-06-16 02:29:12.908529: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 64.00MiB
2022-06-16 02:29:12.908582: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 128.00MiB
2022-06-16 02:29:12.908643: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 256.00MiB
2022-06-16 02:29:12.923406: I tensorflow/compiler/xla/parse_flags_from_env.cc:197] For env var TF_XLA_FLAGS found arguments:
2022-06-16 02:29:12.923498: I tensorflow/compiler/xla/parse_flags_from_env.cc:199]   argv[0] = <argv[0]>
2022-06-16 02:29:12.923586: I tensorflow/compiler/xla/parse_flags_from_env.cc:197] For env var TF_JITRT_FLAGS found arguments:
2022-06-16 02:29:12.923638: I tensorflow/compiler/xla/parse_flags_from_env.cc:199]   argv[0] = <argv[0]>
2022-06-16 02:29:12.923675: I tensorflow/compiler/jit/xla_cpu_device.cc:44] Not creating XLA devices, tf_xla_enable_xla_devices not set and XLA device creation not requested
2022-06-16 02:29:12.923810: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2022-06-16 02:29:12.979252: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1836] Found device 0 with properties:
pciBusID: 0000:18:00.0 name: NVIDIA GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s
2022-06-16 02:29:12.979500: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1836] Found device 1 with properties:
pciBusID: 0000:86:00.0 name: NVIDIA GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s
2022-06-16 02:29:12.979518: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2022-06-16 02:29:12.979554: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2022-06-16 02:29:12.979575: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2022-06-16 02:29:12.980734: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2022-06-16 02:29:12.980967: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2022-06-16 02:29:12.981848: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2022-06-16 02:29:12.982606: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2022-06-16 02:29:12.982648: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2022-06-16 02:29:12.983379: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1975] Adding visible gpu devices: 0, 1
2022-06-16 02:29:12.983405: I tensorflow/compiler/jit/xla_gpu_device.cc:48] Not creating XLA devices, tf_xla_enable_xla_devices not set and XLA devices creation not required
2022-06-16 02:29:12.983775: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-16 02:29:12.986811: I tensorflow/compiler/jit/xla_cpu_device.cc:58] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-06-16 02:29:13.228591: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1836] Found device 0 with properties:
pciBusID: 0000:18:00.0 name: NVIDIA GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s
2022-06-16 02:29:13.228896: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1836] Found device 1 with properties:
pciBusID: 0000:86:00.0 name: NVIDIA GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s
2022-06-16 02:29:13.229531: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1975] Adding visible gpu devices: 0, 1
2022-06-16 02:29:13.229569: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2022-06-16 02:29:13.794299: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1333] Cuda stream priority range on GPU(0): -5,0
2022-06-16 02:29:14.211081: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1333] Cuda stream priority range on GPU(0): -5,0
2022-06-16 02:29:14.211141: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1384] TensorFlow compiled with CUDA 11.2 and cuDNN 8.1.0
2022-06-16 02:29:14.211189: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1396] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-06-16 02:29:14.211199: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402]      0 1
2022-06-16 02:29:14.211204: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1415] 0:   N N
2022-06-16 02:29:14.211208: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1415] 1:   N N
2022-06-16 02:29:14.212286: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1677] GPUDevice PlatformDeviceId 0 TfDeviceId 0 on bus 1 numa: 0 pci: 0000:18:00.0 DeviceLocality: bus_id: 1
links {
}

2022-06-16 02:29:14.212522: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1677] GPUDevice PlatformDeviceId 1 TfDeviceId 1 on bus 2 numa: 1 pci: 0000:86:00.0 DeviceLocality: bus_id: 2
numa_node: 1
links {
}

2022-06-16 02:29:14.212720: I tensorflow/core/common_runtime/bfc_allocator.cc:70] Creating new BFCAllocator named: GPU_0_bfc
2022-06-16 02:29:14.212733: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 256B
2022-06-16 02:29:14.212738: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 512B
2022-06-16 02:29:14.212745: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 1.0KiB
2022-06-16 02:29:14.212750: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 2.0KiB
2022-06-16 02:29:14.212754: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 4.0KiB
2022-06-16 02:29:14.212760: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 8.0KiB
2022-06-16 02:29:14.212765: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 16.0KiB
2022-06-16 02:29:14.212770: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 32.0KiB
2022-06-16 02:29:14.212775: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 64.0KiB
2022-06-16 02:29:14.212780: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 128.0KiB
2022-06-16 02:29:14.212785: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 256.0KiB
2022-06-16 02:29:14.212789: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 512.0KiB
2022-06-16 02:29:14.212794: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 1.00MiB
2022-06-16 02:29:14.212798: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 2.00MiB
2022-06-16 02:29:14.212803: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 4.00MiB
2022-06-16 02:29:14.212808: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 8.00MiB
2022-06-16 02:29:14.212813: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 16.00MiB
2022-06-16 02:29:14.212818: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 32.00MiB
2022-06-16 02:29:14.212823: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 64.00MiB
2022-06-16 02:29:14.212827: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 128.00MiB
2022-06-16 02:29:14.212832: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 256.00MiB
2022-06-16 02:29:14.212869: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1550] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9657 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:18:00.0, compute capability: 7.5
2022-06-16 02:29:14.212895: I tensorflow/stream_executor/stream.cc:261] [stream=0x21d0ed00,impl=0x21d0d8d0] Called Stream::Stream(parent=0x437abe0)
2022-06-16 02:29:14.212906: I tensorflow/stream_executor/stream.cc:308] [stream=0x21d0ed00,impl=0x21d0d8d0] Called Stream::Init()
2022-06-16 02:29:14.212971: I tensorflow/stream_executor/stream.cc:261] [stream=0x21bc8d30,impl=0x5d08860] Called Stream::Stream(parent=0x437abe0)
2022-06-16 02:29:14.212979: I tensorflow/stream_executor/stream.cc:308] [stream=0x21bc8d30,impl=0x5d08860] Called Stream::Init()
2022-06-16 02:29:14.212989: I tensorflow/stream_executor/stream.cc:261] [stream=0x21bc9000,impl=0x21d0d1d0] Called Stream::Stream(parent=0x437abe0)
2022-06-16 02:29:14.212995: I tensorflow/stream_executor/stream.cc:308] [stream=0x21bc9000,impl=0x21d0d1d0] Called Stream::Init()
2022-06-16 02:29:14.213003: I tensorflow/stream_executor/stream.cc:261] [stream=0x21bc92f0,impl=0x21d0d570] Called Stream::Stream(parent=0x437abe0)
2022-06-16 02:29:14.213009: I tensorflow/stream_executor/stream.cc:308] [stream=0x21bc92f0,impl=0x21d0d570] Called Stream::Init()
2022-06-16 02:29:14.213022: I tensorflow/core/common_runtime/bfc_allocator.cc:70] Creating new BFCAllocator named: gpu_host_bfc
2022-06-16 02:29:14.213032: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 256B
2022-06-16 02:29:14.213038: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 512B
2022-06-16 02:29:14.213042: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 1.0KiB
2022-06-16 02:29:14.213046: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 2.0KiB
2022-06-16 02:29:14.213051: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 4.0KiB
2022-06-16 02:29:14.213056: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 8.0KiB
2022-06-16 02:29:14.213061: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 16.0KiB
2022-06-16 02:29:14.213065: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 32.0KiB
2022-06-16 02:29:14.213070: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 64.0KiB
2022-06-16 02:29:14.213075: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 128.0KiB
2022-06-16 02:29:14.213079: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 256.0KiB
2022-06-16 02:29:14.213084: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 512.0KiB
2022-06-16 02:29:14.213088: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 1.00MiB
2022-06-16 02:29:14.213093: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 2.00MiB
2022-06-16 02:29:14.213098: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 4.00MiB
2022-06-16 02:29:14.213102: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 8.00MiB
2022-06-16 02:29:14.213107: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 16.00MiB
2022-06-16 02:29:14.213111: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 32.00MiB
2022-06-16 02:29:14.213116: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 64.00MiB
2022-06-16 02:29:14.213121: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 128.00MiB
2022-06-16 02:29:14.213126: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 256.00MiB
2022-06-16 02:29:14.213699: I tensorflow/core/common_runtime/bfc_allocator.cc:70] Creating new BFCAllocator named: GPU_1_bfc
2022-06-16 02:29:14.213713: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 256B
2022-06-16 02:29:14.213717: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 512B
2022-06-16 02:29:14.213723: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 1.0KiB
2022-06-16 02:29:14.213727: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 2.0KiB
2022-06-16 02:29:14.213732: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 4.0KiB
2022-06-16 02:29:14.213737: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 8.0KiB
2022-06-16 02:29:14.213741: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 16.0KiB
2022-06-16 02:29:14.213746: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 32.0KiB
2022-06-16 02:29:14.213751: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 64.0KiB
2022-06-16 02:29:14.213756: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 128.0KiB
2022-06-16 02:29:14.213760: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 256.0KiB
2022-06-16 02:29:14.213765: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 512.0KiB
2022-06-16 02:29:14.213770: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 1.00MiB
2022-06-16 02:29:14.213775: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 2.00MiB
2022-06-16 02:29:14.213779: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 4.00MiB
2022-06-16 02:29:14.213784: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 8.00MiB
2022-06-16 02:29:14.213789: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 16.00MiB
2022-06-16 02:29:14.213793: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 32.00MiB
2022-06-16 02:29:14.213798: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 64.00MiB
2022-06-16 02:29:14.213803: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 128.00MiB
2022-06-16 02:29:14.213808: I tensorflow/core/common_runtime/bfc_allocator.cc:73] Creating bin of max chunk size 256.00MiB
2022-06-16 02:29:14.213827: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1550] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 9657 MB memory:  -> device: 1, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:86:00.0, compute capability: 7.5
2022-06-16 02:29:14.213838: I tensorflow/stream_executor/stream.cc:261] [stream=0x21bd4450,impl=0x21bd3ac0] Called Stream::Stream(parent=0x55dace0)
2022-06-16 02:29:14.213844: I tensorflow/stream_executor/stream.cc:308] [stream=0x21bd4450,impl=0x21bd3ac0] Called Stream::Init()
2022-06-16 02:29:14.213867: I tensorflow/stream_executor/stream.cc:261] [stream=0x21b085d0,impl=0x21bd3ce0] Called Stream::Stream(parent=0x55dace0)
2022-06-16 02:29:14.213874: I tensorflow/stream_executor/stream.cc:308] [stream=0x21b085d0,impl=0x21bd3ce0] Called Stream::Init()
2022-06-16 02:29:14.213883: I tensorflow/stream_executor/stream.cc:261] [stream=0x21b088c0,impl=0x21d0d290] Called Stream::Stream(parent=0x55dace0)
2022-06-16 02:29:14.213889: I tensorflow/stream_executor/stream.cc:308] [stream=0x21b088c0,impl=0x21d0d290] Called Stream::Init()
2022-06-16 02:29:14.213898: I tensorflow/stream_executor/stream.cc:261] [stream=0x21b08bb0,impl=0x21d0d260] Called Stream::Stream(parent=0x55dace0)
2022-06-16 02:29:14.213904: I tensorflow/stream_executor/stream.cc:308] [stream=0x21b08bb0,impl=0x21d0d260] Called Stream::Init()
2022-06-16 02:29:14.214228: I tensorflow/compiler/jit/xla_gpu_device.cc:79] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-06-16 02:29:14.214287: I tensorflow/core/common_runtime/process_util.cc:159] Session inter op parallelism threads: 32
2022-06-16 02:29:14.219599: I tensorflow/compiler/jit/xla_cpu_device.cc:58] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-06-16 02:29:14.220169: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1836] Found device 0 with properties:
pciBusID: 0000:18:00.0 name: NVIDIA GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s
2022-06-16 02:29:14.220612: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1836] Found device 1 with properties:
pciBusID: 0000:86:00.0 name: NVIDIA GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s
2022-06-16 02:29:14.222014: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1975] Adding visible gpu devices: 0, 1
2022-06-16 02:29:14.222057: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1333] Cuda stream priority range on GPU(1): -5,0
2022-06-16 02:29:14.222078: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1333] Cuda stream priority range on GPU(1): -5,0
2022-06-16 02:29:14.222097: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1384] TensorFlow compiled with CUDA 11.2 and cuDNN 8.1.0
2022-06-16 02:29:14.222139: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1396] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-06-16 02:29:14.222157: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402]      0 1
2022-06-16 02:29:14.222171: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1415] 0:   N N
2022-06-16 02:29:14.222187: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1415] 1:   N N
2022-06-16 02:29:14.222673: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1677] GPUDevice PlatformDeviceId 0 TfDeviceId 0 on bus 1 numa: 0 pci: 0000:18:00.0 DeviceLocality: bus_id: 1
links {
}

2022-06-16 02:29:14.223063: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1677] GPUDevice PlatformDeviceId 1 TfDeviceId 1 on bus 2 numa: 1 pci: 0000:86:00.0 DeviceLocality: bus_id: 2
numa_node: 1
links {
}

2022-06-16 02:29:14.223470: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1550] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9657 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:18:00.0, compute capability: 7.5
2022-06-16 02:29:14.223876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1550] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 9657 MB memory:  -> device: 1, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:86:00.0, compute capability: 7.5
2022-06-16 02:29:14.223915: I tensorflow/compiler/jit/xla_gpu_device.cc:79] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-06-16 02:29:14.228686: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 0
2022-06-16 02:29:14.228809: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 0
2022-06-16 02:29:14.228824: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: MlirV1CompatGraphOptimizationPass
2022-06-16 02:29:14.228836: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:354] MLIR V1 optimization pass is not enabled
2022-06-16 02:29:14.228854: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 9
2022-06-16 02:29:14.228873: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ControlFlowDepsToChainsPass
2022-06-16 02:29:14.228884: I tensorflow/core/common_runtime/control_flow_deps_to_chains.cc:37] ControlFlowDepsToChainsPass::Run
2022-06-16 02:29:14.228945: W tensorflow/core/util/dump_graph.cc:134] Failed to dump control_flow_deps_to_chains_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-16 02:29:14.229000: W tensorflow/core/util/dump_graph.cc:134] Failed to dump control_flow_deps_to_chains_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-16 02:29:14.229024: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 10
2022-06-16 02:29:14.229042: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: AccumulateNV2RemovePass
2022-06-16 02:29:14.229064: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: LowerFunctionalOpsPass
2022-06-16 02:29:14.229093: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ParallelConcatRemovePass
2022-06-16 02:29:14.229117: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 35
2022-06-16 02:29:14.229126: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: IsolatePlacerInspectionRequiredOpsPass
2022-06-16 02:29:14.229136: I tensorflow/core/common_runtime/isolate_placer_inspection_required_ops_pass.cc:34] IsolatePlacerInspectionRequiredOpsPass::Run
2022-06-16 02:29:14.229151: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: IntroduceFloatingPointJitterPass
2022-06-16 02:29:14.229169: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 36
2022-06-16 02:29:14.229184: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: EncapsulateXlaComputationsPass
2022-06-16 02:29:14.229216: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_xla_computations_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-16 02:29:14.229238: I tensorflow/compiler/jit/encapsulate_xla_computations_pass.cc:353] EncapsulateXlaComputations(): (failed to create writable file: INVALID_ARGUMENT: TF_DUMP_GRAPH_PREFIX not specified)
2022-06-16 02:29:14.229328: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_xla_computations_halfway because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-16 02:29:14.229350: I tensorflow/compiler/jit/encapsulate_xla_computations_pass.cc:364] EncapsulateXlaComputations() half-way: (failed to create writable file: INVALID_ARGUMENT: TF_DUMP_GRAPH_PREFIX not specified)
2022-06-16 02:29:14.229387: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_xla_computations_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-16 02:29:14.229407: I tensorflow/compiler/jit/encapsulate_xla_computations_pass.cc:370] EncapsulateXlaComputations() finished: (failed to create writable file: INVALID_ARGUMENT: TF_DUMP_GRAPH_PREFIX not specified)
2022-06-16 02:29:14.229429: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 37
2022-06-16 02:29:14.229440: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: FunctionalizeControlFlowForXlaPass
2022-06-16 02:29:14.229519: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 99999
2022-06-16 02:29:14.229537: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: WeakForwardTypeInferencePass
2022-06-16 02:29:14.229549: I tensorflow/core/common_runtime/forward_type_inference.cc:130] ForwardTypeInferencePass::Run
2022-06-16 02:29:14.229573: W tensorflow/core/util/dump_graph.cc:134] Failed to dump forward_type_inference_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-16 02:29:14.229628: I tensorflow/core/common_runtime/forward_type_inference.cc:311] Finished after 1 iterations; done 9 of 9 nodes in 9 visits
2022-06-16 02:29:14.229654: W tensorflow/core/util/dump_graph.cc:134] Failed to dump forward_type_inference_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-16 02:29:14.229678: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 0
2022-06-16 02:29:14.245353: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node random_uniform/shape}}'Will fall back to a default kernel.

2022-06-16 02:29:14.245392: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Const:GPU::random_uniform/shape takes 15673.1us
2022-06-16 02:29:14.245404: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Const:CPU::random_uniform/shape takes 2.157us
2022-06-16 02:29:14.245427: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: RandomUniform:GPU::random_uniform/RandomUniform takes 6.933us
2022-06-16 02:29:14.245439: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: RandomUniform:CPU::random_uniform/RandomUniform takes 1.49us
2022-06-16 02:29:14.245451: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node random_uniform_1/shape}}'Will fall back to a default kernel.

2022-06-16 02:29:14.245458: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Const:GPU::random_uniform_1/shape takes 10.165us
2022-06-16 02:29:14.245466: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Const:CPU::random_uniform_1/shape takes 0.407us
2022-06-16 02:29:14.245475: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: RandomUniform:GPU::random_uniform_1/RandomUniform takes 1.045us
2022-06-16 02:29:14.245482: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: RandomUniform:CPU::random_uniform_1/RandomUniform takes 0.749us
2022-06-16 02:29:14.245491: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node Placeholder}}'Will fall back to a default kernel.

2022-06-16 02:29:14.245498: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Placeholder:GPU::Placeholder takes 7.154us
2022-06-16 02:29:14.245505: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Placeholder:CPU::Placeholder takes 1.031us
2022-06-16 02:29:14.245522: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: BatchMatMulV2:GPU::MatMul takes 8.5us
2022-06-16 02:29:14.245532: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: BatchMatMulV2:CPU::MatMul takes 2.909us
2022-06-16 02:29:14.245543: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: AddV2:GPU::Add takes 3.099us
2022-06-16 02:29:14.245557: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: AddV2:CPU::Add takes 2.481us
2022-06-16 02:29:14.245600: I tensorflow/core/common_runtime/placer.cc:124] random_uniform/RandomUniform(RandomUniform) placed on: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:14.245613: I tensorflow/core/common_runtime/placer.cc:124] random_uniform_1/RandomUniform(RandomUniform) placed on: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:14.245621: I tensorflow/core/common_runtime/placer.cc:124] MatMul(BatchMatMulV2) placed on: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:14.245629: I tensorflow/core/common_runtime/placer.cc:124] Add(AddV2) placed on: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:14.245638: I tensorflow/core/common_runtime/placer.cc:124] random_uniform/shape(Const) placed on: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:14.245646: I tensorflow/core/common_runtime/placer.cc:124] random_uniform_1/shape(Const) placed on: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:14.245653: I tensorflow/core/common_runtime/placer.cc:124] Placeholder(Placeholder) placed on: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:14.245664: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 1
2022-06-16 02:29:14.245672: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 0
2022-06-16 02:29:14.245678: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: NcclReplacePass
2022-06-16 02:29:14.245699: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 1
2022-06-16 02:29:14.245905: I tensorflow/core/common_runtime/bfc_allocator.cc:157] Extending allocation by 2.00MiB bytes for mklcpu.
2022-06-16 02:29:14.245918: I tensorflow/core/common_runtime/bfc_allocator.cc:162] Total allocated bytes: 2.00MiB
2022-06-16 02:29:14.245925: I tensorflow/core/common_runtime/bfc_allocator.cc:165] Allocated memory at 0x211a3840 to 0x213a3840
2022-06-16 02:29:14.246040: I tensorflow/core/common_runtime/graph_execution_state.cc:854] BuildGraph
2022-06-16 02:29:14.264466: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2500000000 Hz
2022-06-16 02:29:14.264905: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1013] Starting optimization for grappler item: tf_graph
2022-06-16 02:29:14.264932: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1034] Deleted 0 unreachable functions from the graph (library size = 0)
2022-06-16 02:29:14.265262: I tensorflow/core/grappler/grappler_item.cc:109] Add fetch Add:0
2022-06-16 02:29:14.265277: I tensorflow/core/grappler/grappler_item.cc:113] Add feed Placeholder
2022-06-16 02:29:14.265401: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:933] model_pruner: Graph size after: 7 nodes (0), 6 edges (0), time = 0.131ms.
2022-06-16 02:29:14.266088: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:933] tfg_optimizer{tfg-consolidate-attrs,tfg-prepare-attrs-export}: Graph size after: 7 nodes (0), 6 edges (0), time = 0.638ms.
2022-06-16 02:29:14.266184: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:933] implementation_selector: Graph size after: 7 nodes (0), 6 edges (0), time = 0.057ms.
2022-06-16 02:29:14.266217: I tensorflow/core/grappler/grappler_item.cc:109] Add fetch Add:0
2022-06-16 02:29:14.266227: I tensorflow/core/grappler/grappler_item.cc:113] Add feed Placeholder
2022-06-16 02:29:14.266352: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:933] common_subgraph_elimination: Graph size after: 6 nodes (-1), 6 edges (0), time = 0.13ms.
2022-06-16 02:29:14.266373: I tensorflow/core/grappler/grappler_item.cc:109] Add fetch Add:0
2022-06-16 02:29:14.266384: I tensorflow/core/grappler/grappler_item.cc:113] Add feed Placeholder
2022-06-16 02:29:14.266448: I tensorflow/core/grappler/costs/graph_properties.cc:2377] Propagating 2 new shapes through 0 loops and 0 resources

2022-06-16 02:29:14.266558: I tensorflow/core/grappler/costs/graph_properties.cc:2145] Checking any conflics in shapes and dimensions ...
2022-06-16 02:29:14.266575: I tensorflow/core/grappler/costs/graph_properties.cc:2180] **** No incompatible shape found from SymbolicShapeManager.
2022-06-16 02:29:14.266733: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:933] constant_folding: Graph size after: 6 nodes (0), 6 edges (0), time = 0.357ms.
2022-06-16 02:29:14.266765: I tensorflow/core/grappler/grappler_item.cc:109] Add fetch Add:0
2022-06-16 02:29:14.266772: I tensorflow/core/grappler/grappler_item.cc:113] Add feed Placeholder
2022-06-16 02:29:14.266812: I tensorflow/core/grappler/costs/graph_properties.cc:2377] Propagating 2 new shapes through 0 loops and 0 resources

2022-06-16 02:29:14.266873: I tensorflow/core/grappler/costs/graph_properties.cc:2145] Checking any conflics in shapes and dimensions ...
2022-06-16 02:29:14.266888: I tensorflow/core/grappler/costs/graph_properties.cc:2180] **** No incompatible shape found from SymbolicShapeManager.
2022-06-16 02:29:14.266966: I tensorflow/core/grappler/optimizers/arithmetic_optimizer.cc:4372] Run 31 arithmetic optimizer stages: AddOpsRewrite, FoldConjugateIntoTranspose, FoldMultiplyIntoConv, FoldTransposeIntoMatMul, MinimizeBroadcasts, RemoveIdentityTranspose, RemoveInvolution, RemoveRedundantBitcast, RemoveRedundantCast, ReplacePackWithTileReshape, ReplaceMulWithBroadcastByTile, ReduceUpsamplingDims, RemoveRedundantReshapeOrBroadcastTo, RemoveNegation, ReplaceMulWithSquare, RemoveLogicalNot, ReorderCastLikeAndValuePreserving, SimplifyAggregation, , SqrtDivToRsqrtMul, RemoveIdempotent, ConvertPow, ConvertLog1p, LogSoftmaxStage, OptimizeMaxOrMinOfMonotonicStage, ConvertExpm1, UnaryOpsComposition, RemoveStackStridedSliceSameAxis, SimplifyEmbeddingLookupStage, RemoveCastIntoSegmentReductionStage, FuseSquaredDiffStage
2022-06-16 02:29:14.267051: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:933] arithmetic_optimizer: Graph size after: 6 nodes (0), 6 edges (0), time = 0.28ms.
2022-06-16 02:29:14.267108: I tensorflow/core/grappler/costs/graph_properties.cc:2377] Propagating 2 new shapes through 0 loops and 0 resources

2022-06-16 02:29:14.267165: I tensorflow/core/grappler/costs/graph_properties.cc:2145] Checking any conflics in shapes and dimensions ...
2022-06-16 02:29:14.267181: I tensorflow/core/grappler/costs/graph_properties.cc:2180] **** No incompatible shape found from SymbolicShapeManager.
2022-06-16 02:29:14.267236: I tensorflow/core/grappler/grappler_item.cc:109] Add fetch Add:0
2022-06-16 02:29:14.267247: I tensorflow/core/grappler/grappler_item.cc:113] Add feed Placeholder
2022-06-16 02:29:14.267343: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node random_uniform/shape}}'Will fall back to a default kernel.

2022-06-16 02:29:14.267358: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Const:GPU::random_uniform/shape takes 22.455us
2022-06-16 02:29:14.267379: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node Placeholder}}'Will fall back to a default kernel.

2022-06-16 02:29:14.267390: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Placeholder:GPU::Placeholder takes 10.951us
2022-06-16 02:29:14.267406: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: RandomUniform:GPU::random_uniform/RandomUniform takes 2.338us
2022-06-16 02:29:14.267421: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: RandomUniform:GPU::random_uniform_1/RandomUniform takes 1.171us
2022-06-16 02:29:14.267439: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: BatchMatMulV2:GPU::MatMul takes 2.827us
2022-06-16 02:29:14.267460: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: AddV2:GPU::Add takes 3.702us
2022-06-16 02:29:14.267497: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:933] layout: Graph size after: 6 nodes (0), 6 edges (0), time = 0.421ms.
2022-06-16 02:29:14.267526: I tensorflow/core/grappler/grappler_item.cc:109] Add fetch Add:0
2022-06-16 02:29:14.267535: I tensorflow/core/grappler/grappler_item.cc:113] Add feed Placeholder
2022-06-16 02:29:14.267751: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:933] remapper: Graph size after: 6 nodes (0), 6 edges (0), time = 0.229ms.
2022-06-16 02:29:14.267781: I tensorflow/core/grappler/grappler_item.cc:109] Add fetch Add:0
2022-06-16 02:29:14.267790: I tensorflow/core/grappler/grappler_item.cc:113] Add feed Placeholder
2022-06-16 02:29:14.267808: I tensorflow/core/grappler/grappler_item.cc:109] Add fetch Add:0
2022-06-16 02:29:14.267814: I tensorflow/core/grappler/grappler_item.cc:113] Add feed Placeholder
2022-06-16 02:29:14.267845: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:933] loop_optimizer: Graph size after: 6 nodes (0), 6 edges (0), time = 0.073ms.
2022-06-16 02:29:14.267868: I tensorflow/core/grappler/grappler_item.cc:109] Add fetch Add:0
2022-06-16 02:29:14.267876: I tensorflow/core/grappler/grappler_item.cc:113] Add feed Placeholder
2022-06-16 02:29:14.267921: I tensorflow/core/grappler/optimizers/dependency_optimizer.cc:626] Removed 0 out of 0 control dependencies
2022-06-16 02:29:14.267940: I tensorflow/core/grappler/optimizers/dependency_optimizer.cc:494] Deleted 0 out of 6 nodes.
2022-06-16 02:29:14.267960: I tensorflow/core/grappler/optimizers/dependency_optimizer.cc:648] DependencyOptimizer::GroupCrossDeviceControlEdges host_granularity=0
2022-06-16 02:29:14.267972: I tensorflow/core/grappler/optimizers/dependency_optimizer.cc:648] DependencyOptimizer::GroupCrossDeviceControlEdges host_granularity=1
2022-06-16 02:29:14.267998: I tensorflow/core/grappler/optimizers/dependency_optimizer.cc:626] Removed 0 out of 0 control dependencies
2022-06-16 02:29:14.268011: I tensorflow/core/grappler/optimizers/dependency_optimizer.cc:494] Deleted 0 out of 6 nodes.
2022-06-16 02:29:14.268025: I tensorflow/core/grappler/optimizers/dependency_optimizer.cc:648] DependencyOptimizer::GroupCrossDeviceControlEdges host_granularity=0
2022-06-16 02:29:14.268034: I tensorflow/core/grappler/optimizers/dependency_optimizer.cc:648] DependencyOptimizer::GroupCrossDeviceControlEdges host_granularity=1
2022-06-16 02:29:14.268050: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:933] dependency_optimizer: Graph size after: 6 nodes (0), 6 edges (0), time = 0.184ms.
2022-06-16 02:29:14.268233: I tensorflow/core/grappler/costs/graph_properties.cc:2377] Propagating 2 new shapes through 0 loops and 0 resources

2022-06-16 02:29:14.268346: I tensorflow/core/grappler/costs/graph_properties.cc:2145] Checking any conflics in shapes and dimensions ...
2022-06-16 02:29:14.268372: I tensorflow/core/grappler/costs/graph_properties.cc:2180] **** No incompatible shape found from SymbolicShapeManager.
2022-06-16 02:29:14.268519: I tensorflow/core/grappler/costs/op_level_cost_estimator.cc:1923] Op:Placeholder Minimum cost for Identity
2022-06-16 02:29:14.268540: I tensorflow/core/grappler/costs/op_level_cost_estimator.cc:1634] Output Size: 65536 Total Output Size:65536
2022-06-16 02:29:14.268553: I tensorflow/core/grappler/costs/op_level_cost_estimator.cc:677] Operation Placeholder takes 1 ns.
2022-06-16 02:29:14.268610: I tensorflow/core/grappler/costs/op_level_cost_estimator.cc:1942] Op:Const Minimum cost for Variable
2022-06-16 02:29:14.268623: I tensorflow/core/grappler/costs/op_level_cost_estimator.cc:1634] Output Size: 8 Total Output Size:8
2022-06-16 02:29:14.268630: I tensorflow/core/grappler/costs/op_level_cost_estimator.cc:677] Operation Const takes 1 ns.
2022-06-16 02:29:14.268666: I tensorflow/core/grappler/costs/op_level_cost_estimator.cc:714] Missing accurate estimator for op: RandomUniform
2022-06-16 02:29:14.268691: I tensorflow/core/grappler/costs/op_level_cost_estimator.cc:778] Device: GPU gflops: 13447.7 gb_per_sec: 616
2022-06-16 02:29:14.268704: I tensorflow/core/grappler/costs/op_level_cost_estimator.cc:853] Op:RandomUniform GOps:0 Compute Time (ns):0
2022-06-16 02:29:14.268714: I tensorflow/core/grappler/costs/op_level_cost_estimator.cc:858] Op:RandomUniform Size (KB):524.296 Memory Time (ns):852
2022-06-16 02:29:14.268723: I tensorflow/core/grappler/costs/op_level_cost_estimator.cc:878] Op:RandomUniform Size (KB):524.296 Intermediate Memory Time (ns):0
2022-06-16 02:29:14.268731: I tensorflow/core/grappler/costs/op_level_cost_estimator.cc:677] Operation RandomUniform takes 852 ns.
2022-06-16 02:29:14.268770: I tensorflow/core/grappler/costs/op_level_cost_estimator.cc:714] Missing accurate estimator for op: RandomUniform
2022-06-16 02:29:14.268786: I tensorflow/core/grappler/costs/op_level_cost_estimator.cc:778] Device: GPU gflops: 13447.7 gb_per_sec: 616
2022-06-16 02:29:14.268795: I tensorflow/core/grappler/costs/op_level_cost_estimator.cc:853] Op:RandomUniform GOps:0 Compute Time (ns):0
2022-06-16 02:29:14.268803: I tensorflow/core/grappler/costs/op_level_cost_estimator.cc:858] Op:RandomUniform Size (KB):524.296 Memory Time (ns):852
2022-06-16 02:29:14.268812: I tensorflow/core/grappler/costs/op_level_cost_estimator.cc:878] Op:RandomUniform Size (KB):524.296 Intermediate Memory Time (ns):0
2022-06-16 02:29:14.268820: I tensorflow/core/grappler/costs/op_level_cost_estimator.cc:677] Operation RandomUniform takes 852 ns.
2022-06-16 02:29:14.268863: I tensorflow/core/grappler/costs/op_level_cost_estimator.cc:1072] Key:transpose_a Value:false
2022-06-16 02:29:14.268875: I tensorflow/core/grappler/costs/op_level_cost_estimator.cc:1072] Key:transpose_b Value:false
2022-06-16 02:29:14.268883: I tensorflow/core/grappler/costs/op_level_cost_estimator.cc:1079] transpose_a:0
2022-06-16 02:29:14.268890: I tensorflow/core/grappler/costs/op_level_cost_estimator.cc:1080] transpose_b:0
2022-06-16 02:29:14.268902: I tensorflow/core/grappler/costs/op_level_cost_estimator.cc:1100] M, N, K: 1024,128,128
2022-06-16 02:29:14.268911: I tensorflow/core/grappler/costs/op_level_cost_estimator.cc:1112] Operations for Matmul: 3.35544e+07
2022-06-16 02:29:14.268926: I tensorflow/core/grappler/costs/op_level_cost_estimator.cc:778] Device: GPU gflops: 13447.7 gb_per_sec: 616
2022-06-16 02:29:14.268940: I tensorflow/core/grappler/costs/op_level_cost_estimator.cc:853] Op:BatchMatMulV2 GOps:0.0335544 Compute Time (ns):2496
2022-06-16 02:29:14.268949: I tensorflow/core/grappler/costs/op_level_cost_estimator.cc:858] Op:BatchMatMulV2 Size (KB):1114.11 Memory Time (ns):1809
2022-06-16 02:29:14.268957: I tensorflow/core/grappler/costs/op_level_cost_estimator.cc:878] Op:BatchMatMulV2 Size (KB):1114.11 Intermediate Memory Time (ns):0
2022-06-16 02:29:14.268965: I tensorflow/core/grappler/costs/op_level_cost_estimator.cc:677] Operation BatchMatMulV2 takes 4305 ns.
2022-06-16 02:29:14.269004: I tensorflow/core/grappler/costs/op_level_cost_estimator.cc:1606] Input Count: 131072 Largest Input Count:131072
2022-06-16 02:29:14.269015: I tensorflow/core/grappler/costs/op_level_cost_estimator.cc:1606] Input Count: 131072 Largest Input Count:131072
2022-06-16 02:29:14.269029: I tensorflow/core/grappler/costs/op_level_cost_estimator.cc:778] Device: GPU gflops: 13447.7 gb_per_sec: 616
2022-06-16 02:29:14.269038: I tensorflow/core/grappler/costs/op_level_cost_estimator.cc:853] Op:AddV2 GOps:0.000131072 Compute Time (ns):10
2022-06-16 02:29:14.269047: I tensorflow/core/grappler/costs/op_level_cost_estimator.cc:858] Op:AddV2 Size (KB):1572.86 Memory Time (ns):2554
2022-06-16 02:29:14.269056: I tensorflow/core/grappler/costs/op_level_cost_estimator.cc:878] Op:AddV2 Size (KB):1572.86 Intermediate Memory Time (ns):0
2022-06-16 02:29:14.269063: I tensorflow/core/grappler/costs/op_level_cost_estimator.cc:677] Operation AddV2 takes 2564 ns.
2022-06-16 02:29:14.269091: I tensorflow/core/grappler/costs/analytical_cost_estimator.cc:221] 5 out of 6 nodes have inaccurate time estimation
2022-06-16 02:29:14.269178: I tensorflow/core/grappler/costs/analytical_cost_estimator.cc:239]
Aggregated per device / channel type tensor size histogram:
Device: /localhost/GPU
Count: 6, Average: 352.0KiB, Min: 8B, Max: 512.0KiB
------------------------------------------------------
[           8B,          16B)       1  16.667%  16.667% #######
[      64.0KiB,     128.0KiB)       1  16.667%  33.333% #######
[     512.0KiB,      1.00MiB)       4  66.667% 100.000% ###########################


2022-06-16 02:29:14.269232: I tensorflow/core/grappler/costs/graph_memory.cc:263] At time 0 allocated 524288 for tensor random_uniform_1/RandomUniform:0
2022-06-16 02:29:14.269244: I tensorflow/core/grappler/costs/graph_memory.cc:263] At time 0 allocated 524288 for tensor random_uniform/RandomUniform:0
2022-06-16 02:29:14.269251: I tensorflow/core/grappler/costs/graph_memory.cc:263] At time 0 allocated 8 for tensor random_uniform/shape:0
2022-06-16 02:29:14.269259: I tensorflow/core/grappler/costs/graph_memory.cc:263] At time 0 allocated 65536 for tensor Placeholder:0
2022-06-16 02:29:14.269268: I tensorflow/core/grappler/costs/graph_memory.cc:263] At time 1000 allocated 524288 for tensor MatMul:0
2022-06-16 02:29:14.269276: I tensorflow/core/grappler/costs/graph_memory.cc:269] At time 1001 deallocated 8 for tensor random_uniform/shape:0
2022-06-16 02:29:14.269283: I tensorflow/core/grappler/costs/graph_memory.cc:263] At time 6000 allocated 524288 for tensor Add:0
2022-06-16 02:29:14.269291: I tensorflow/core/grappler/costs/graph_memory.cc:269] At time 6001 deallocated 524288 for tensor random_uniform/RandomUniform:0
2022-06-16 02:29:14.269298: I tensorflow/core/grappler/costs/graph_memory.cc:269] At time 6001 deallocated 65536 for tensor Placeholder:0
2022-06-16 02:29:14.269306: I tensorflow/core/grappler/costs/graph_memory.cc:269] At time 8001 deallocated 524288 for tensor Add:0
2022-06-16 02:29:14.269313: I tensorflow/core/grappler/costs/graph_memory.cc:269] At time 8001 deallocated 524288 for tensor MatMul:0
2022-06-16 02:29:14.269320: I tensorflow/core/grappler/costs/graph_memory.cc:269] At time 8001 deallocated 524288 for tensor random_uniform_1/RandomUniform:0
2022-06-16 02:29:14.269415: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:933] memory_optimizer: Graph size after: 6 nodes (0), 6 edges (0), time = 1.337ms.
2022-06-16 02:29:14.269440: I tensorflow/core/grappler/grappler_item.cc:109] Add fetch Add:0
2022-06-16 02:29:14.269451: I tensorflow/core/grappler/grappler_item.cc:113] Add feed Placeholder
2022-06-16 02:29:14.269510: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:933] model_pruner: Graph size after: 6 nodes (0), 6 edges (0), time = 0.066ms.
2022-06-16 02:29:14.269775: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:933] tfg_optimizer{tfg-consolidate-attrs,tfg-prepare-attrs-export}: Graph size after: 6 nodes (0), 6 edges (0), time = 0.233ms.
2022-06-16 02:29:14.269836: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:933] implementation_selector: Graph size after: 6 nodes (0), 6 edges (0), time = 0.027ms.
2022-06-16 02:29:14.269869: I tensorflow/core/grappler/grappler_item.cc:109] Add fetch Add:0
2022-06-16 02:29:14.269880: I tensorflow/core/grappler/grappler_item.cc:113] Add feed Placeholder
2022-06-16 02:29:14.269955: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:933] common_subgraph_elimination: Graph size after: 6 nodes (0), 6 edges (0), time = 0.081ms.
2022-06-16 02:29:14.269979: I tensorflow/core/grappler/grappler_item.cc:109] Add fetch Add:0
2022-06-16 02:29:14.269991: I tensorflow/core/grappler/grappler_item.cc:113] Add feed Placeholder
2022-06-16 02:29:14.270038: I tensorflow/core/grappler/costs/graph_properties.cc:2377] Propagating 2 new shapes through 0 loops and 0 resources

2022-06-16 02:29:14.270126: I tensorflow/core/grappler/costs/graph_properties.cc:2145] Checking any conflics in shapes and dimensions ...
2022-06-16 02:29:14.270148: I tensorflow/core/grappler/costs/graph_properties.cc:2180] **** No incompatible shape found from SymbolicShapeManager.
2022-06-16 02:29:14.270275: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:933] constant_folding: Graph size after: 6 nodes (0), 6 edges (0), time = 0.292ms.
2022-06-16 02:29:14.270306: I tensorflow/core/grappler/grappler_item.cc:109] Add fetch Add:0
2022-06-16 02:29:14.270317: I tensorflow/core/grappler/grappler_item.cc:113] Add feed Placeholder
2022-06-16 02:29:14.270375: I tensorflow/core/grappler/costs/graph_properties.cc:2377] Propagating 2 new shapes through 0 loops and 0 resources

2022-06-16 02:29:14.270450: I tensorflow/core/grappler/costs/graph_properties.cc:2145] Checking any conflics in shapes and dimensions ...
2022-06-16 02:29:14.270470: I tensorflow/core/grappler/costs/graph_properties.cc:2180] **** No incompatible shape found from SymbolicShapeManager.
2022-06-16 02:29:14.270568: I tensorflow/core/grappler/optimizers/arithmetic_optimizer.cc:4372] Run 31 arithmetic optimizer stages: AddOpsRewrite, FoldConjugateIntoTranspose, FoldMultiplyIntoConv, FoldTransposeIntoMatMul, MinimizeBroadcasts, RemoveIdentityTranspose, RemoveInvolution, RemoveRedundantBitcast, RemoveRedundantCast, ReplacePackWithTileReshape, ReplaceMulWithBroadcastByTile, ReduceUpsamplingDims, RemoveRedundantReshapeOrBroadcastTo, RemoveNegation, ReplaceMulWithSquare, RemoveLogicalNot, ReorderCastLikeAndValuePreserving, SimplifyAggregation, , SqrtDivToRsqrtMul, RemoveIdempotent, ConvertPow, ConvertLog1p, LogSoftmaxStage, OptimizeMaxOrMinOfMonotonicStage, ConvertExpm1, UnaryOpsComposition, RemoveStackStridedSliceSameAxis, SimplifyEmbeddingLookupStage, RemoveCastIntoSegmentReductionStage, FuseSquaredDiffStage
2022-06-16 02:29:14.270636: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:933] arithmetic_optimizer: Graph size after: 6 nodes (0), 6 edges (0), time = 0.325ms.
2022-06-16 02:29:14.270670: I tensorflow/core/grappler/grappler_item.cc:109] Add fetch Add:0
2022-06-16 02:29:14.270681: I tensorflow/core/grappler/grappler_item.cc:113] Add feed Placeholder
2022-06-16 02:29:14.270948: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:933] remapper: Graph size after: 6 nodes (0), 6 edges (0), time = 0.284ms.
2022-06-16 02:29:14.270982: I tensorflow/core/grappler/grappler_item.cc:109] Add fetch Add:0
2022-06-16 02:29:14.270993: I tensorflow/core/grappler/grappler_item.cc:113] Add feed Placeholder
2022-06-16 02:29:14.271026: I tensorflow/core/grappler/optimizers/dependency_optimizer.cc:626] Removed 0 out of 0 control dependencies
2022-06-16 02:29:14.271043: I tensorflow/core/grappler/optimizers/dependency_optimizer.cc:494] Deleted 0 out of 6 nodes.
2022-06-16 02:29:14.271058: I tensorflow/core/grappler/optimizers/dependency_optimizer.cc:648] DependencyOptimizer::GroupCrossDeviceControlEdges host_granularity=0
2022-06-16 02:29:14.271069: I tensorflow/core/grappler/optimizers/dependency_optimizer.cc:648] DependencyOptimizer::GroupCrossDeviceControlEdges host_granularity=1
2022-06-16 02:29:14.271094: I tensorflow/core/grappler/optimizers/dependency_optimizer.cc:626] Removed 0 out of 0 control dependencies
2022-06-16 02:29:14.271107: I tensorflow/core/grappler/optimizers/dependency_optimizer.cc:494] Deleted 0 out of 6 nodes.
2022-06-16 02:29:14.271120: I tensorflow/core/grappler/optimizers/dependency_optimizer.cc:648] DependencyOptimizer::GroupCrossDeviceControlEdges host_granularity=0
2022-06-16 02:29:14.271129: I tensorflow/core/grappler/optimizers/dependency_optimizer.cc:648] DependencyOptimizer::GroupCrossDeviceControlEdges host_granularity=1
2022-06-16 02:29:14.271144: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:933] dependency_optimizer: Graph size after: 6 nodes (0), 6 edges (0), time = 0.17ms.
2022-06-16 02:29:14.271252: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1046] Optimized main graph.
2022-06-16 02:29:14.271824: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:933] tfg_optimizer{tfg-consolidate-attrs,tfg-functional-to-region,tfg.func(tfg-cf-sink),tfg-region-to-functional{force-control-capture=true},tfg-prepare-attrs-export}: Graph size after: 6 nodes (0), 6 edges (0), time = 0.417ms.
2022-06-16 02:29:14.272067: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:933] tfg_optimizer{tfg-consolidate-attrs,tfg-functional-to-region,tfg.func(tfg-cf-sink),tfg-region-to-functional{force-control-capture=true},tfg-prepare-attrs-export}: Graph size after: 6 nodes (0), 6 edges (0), time = 0.193ms.
2022-06-16 02:29:14.272175: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1267] Optimized 0 functions:
2022-06-16 02:29:14.272192: W tensorflow/core/util/dump_graph.cc:134] Failed to dump after_MetaOptimizer_140722650481008 because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-16 02:29:14.272497: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 2
2022-06-16 02:29:14.272517: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 5
2022-06-16 02:29:14.272525: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: CloneConstantsForBetterClusteringPass
2022-06-16 02:29:14.272538: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 9
2022-06-16 02:29:14.272552: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ClusterScopingPass
2022-06-16 02:29:14.272561: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 10
2022-06-16 02:29:14.272568: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: MarkForCompilationPass
2022-06-16 02:29:14.280846: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: XlaLaunch:CPU::_XlaLaunch-op takes 2.005us
2022-06-16 02:29:14.280871: I tensorflow/compiler/tf2xla/xla_op_registry.cc:51] LaunchOpHasKernelForDevice kernel_class_name: XlaLocalLaunchOp
2022-06-16 02:29:14.280883: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: XlaLaunch:GPU::_XlaLaunch-op takes 0.986us
2022-06-16 02:29:14.280889: I tensorflow/compiler/tf2xla/xla_op_registry.cc:51] LaunchOpHasKernelForDevice kernel_class_name: XlaLocalLaunchOp
2022-06-16 02:29:14.280914: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Const:XLA_GPU_JIT::random_uniform/shape takes 1.888us
2022-06-16 02:29:14.280951: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: RandomUniform:XLA_GPU_JIT::random_uniform/RandomUniform takes 2.493us
2022-06-16 02:29:14.280966: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: RandomUniform:XLA_GPU_JIT::random_uniform_1/RandomUniform takes 0.527us
2022-06-16 02:29:14.280979: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: BatchMatMulV2:XLA_GPU_JIT::MatMul takes 0.676us
2022-06-16 02:29:14.280992: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: AddV2:XLA_GPU_JIT::Add takes 1.28us
2022-06-16 02:29:14.281079: I tensorflow/compiler/jit/mark_for_compilation_pass.cc:650] DeadnessAnalysis time: 18 us (cumulative: 18 us, max: 18 us, #called: 1)
2022-06-16 02:29:14.281151: I tensorflow/compiler/jit/mark_for_compilation_pass.cc:1523] MarkForCompilationPassImpl::Run time: 579 us (cumulative: 579 us, max: 579 us, #called: 1)
2022-06-16 02:29:14.281168: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 12
2022-06-16 02:29:14.281174: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ForceXlaConstantsOnHostPass
2022-06-16 02:29:14.281192: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 20
2022-06-16 02:29:14.281199: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: IncreaseDynamismForAutoJitPass
2022-06-16 02:29:14.281207: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 30
2022-06-16 02:29:14.281212: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: PartiallyDeclusterPass
2022-06-16 02:29:14.281244: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 40
2022-06-16 02:29:14.281251: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: ReportClusteringInfoPass
2022-06-16 02:29:14.281375: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 50
2022-06-16 02:29:14.281384: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: EncapsulateSubgraphsPass
2022-06-16 02:29:14.281390: I tensorflow/compiler/jit/encapsulate_subgraphs_pass.cc:1139] EncapsulateSubgraphsPass::Run
2022-06-16 02:29:14.281420: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_subgraphs_before because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-16 02:29:14.281541: W tensorflow/core/util/dump_graph.cc:134] Failed to dump encapsulate_subgraphs_after because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-16 02:29:14.281576: I tensorflow/compiler/jit/xla_cluster_util.cc:590] GetNodesRelatedToRefVariables() found 0 nodes
2022-06-16 02:29:14.281607: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 60
2022-06-16 02:29:14.281614: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: BuildXlaOpsPass
2022-06-16 02:29:14.281628: I tensorflow/compiler/jit/build_xla_ops_pass.cc:603] print_outputs = 0
2022-06-16 02:29:14.281633: I tensorflow/compiler/jit/build_xla_ops_pass.cc:604] check_input_numerics = 0
2022-06-16 02:29:14.281637: I tensorflow/compiler/jit/build_xla_ops_pass.cc:605] check_output_numerics = 0
2022-06-16 02:29:14.281656: W tensorflow/core/util/dump_graph.cc:134] Failed to dump build_xla_ops because dump location is not  specified through either TF_DUMP_GRAPH_PREFIX environment variable or function argument.
2022-06-16 02:29:14.281673: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 2
2022-06-16 02:29:14.281731: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node random_uniform/shape}}'Will fall back to a default kernel.

2022-06-16 02:29:14.281742: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Const:GPU::random_uniform/shape takes 17.308us
2022-06-16 02:29:14.281758: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: RandomUniform:GPU::random_uniform/RandomUniform takes 1.953us
2022-06-16 02:29:14.281768: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: RandomUniform:GPU::random_uniform_1/RandomUniform takes 0.808us
2022-06-16 02:29:14.281777: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: BatchMatMulV2:GPU::MatMul takes 1.621us
2022-06-16 02:29:14.281788: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: AddV2:GPU::Add takes 2.219us
2022-06-16 02:29:14.281796: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:CPU::_arg_Placeholder_0_0 takes 0.58us
2022-06-16 02:29:14.281805: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:CPU::_retval_Add_0_0 takes 0.602us
2022-06-16 02:29:14.281886: I tensorflow/core/graph/graph_partition.cc:281] Receiving data from _arg_Placeholder_0_0 (_Arg) on /job:localhost/replica:0/task:0/device:CPU:0 in device memory for MatMul (BatchMatMulV2) on /job:localhost/replica:0/task:0/device:GPU:0 in device memory
2022-06-16 02:29:14.281925: I tensorflow/core/graph/graph_partition.cc:281] Receiving data from Add (AddV2) on /job:localhost/replica:0/task:0/device:GPU:0 in device memory for _retval_Add_0_0 (_Retval) on /job:localhost/replica:0/task:0/device:CPU:0 in device memory
2022-06-16 02:29:14.281948: I tensorflow/core/graph/graph_partition.cc:1251] Added send/recv: controls=0, data=2
2022-06-16 02:29:14.282035: I tensorflow/core/common_runtime/optimization_registry.cc:54] Starting optimization of a group 3
2022-06-16 02:29:14.282050: I tensorflow/core/common_runtime/optimization_registry.cc:66] Running optimization phase 1
2022-06-16 02:29:14.282060: I tensorflow/core/common_runtime/optimization_registry.cc:68] Running optimization pass: MklLayoutRewritePass
2022-06-16 02:29:14.282070: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node Const, reason: User has assigned a device that is not CPU.
2022-06-16 02:29:14.282076: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node RandomUniform, reason: User has assigned a device that is not CPU.
2022-06-16 02:29:14.282080: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node RandomUniform, reason: User has assigned a device that is not CPU.
2022-06-16 02:29:14.282085: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _Recv, reason: User has assigned a device that is not CPU.
2022-06-16 02:29:14.282089: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node BatchMatMulV2, reason: User has assigned a device that is not CPU.
2022-06-16 02:29:14.282094: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node AddV2, reason: User has assigned a device that is not CPU.
2022-06-16 02:29:14.282099: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _Send, reason: User has assigned a device that is not CPU.
2022-06-16 02:29:14.282104: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node Const, reason: User has assigned a device that is not CPU.
2022-06-16 02:29:14.282109: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node RandomUniform, reason: User has assigned a device that is not CPU.
2022-06-16 02:29:14.282113: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node RandomUniform, reason: User has assigned a device that is not CPU.
2022-06-16 02:29:14.282118: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _Recv, reason: User has assigned a device that is not CPU.
2022-06-16 02:29:14.282122: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node BatchMatMulV2, reason: User has assigned a device that is not CPU.
2022-06-16 02:29:14.282127: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node AddV2, reason: User has assigned a device that is not CPU.
2022-06-16 02:29:14.282131: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _Send, reason: User has assigned a device that is not CPU.
2022-06-16 02:29:14.282137: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node Const, reason: User has assigned a device that is not CPU.
2022-06-16 02:29:14.282141: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node RandomUniform, reason: User has assigned a device that is not CPU.
2022-06-16 02:29:14.282146: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node RandomUniform, reason: User has assigned a device that is not CPU.
2022-06-16 02:29:14.282150: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _Recv, reason: User has assigned a device that is not CPU.
2022-06-16 02:29:14.282155: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node BatchMatMulV2, reason: User has assigned a device that is not CPU.
2022-06-16 02:29:14.282159: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node AddV2, reason: User has assigned a device that is not CPU.
2022-06-16 02:29:14.282163: I tensorflow/core/common_runtime/mkl_layout_pass.cc:1040] MklLayoutRewritePass: Skipping rewriting of the node _Send, reason: User has assigned a device that is not CPU.
2022-06-16 02:29:14.286237: I tensorflow/core/common_runtime/optimization_registry.cc:87] Finished optimization of a group 3
2022-06-16 02:29:14.286346: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: BatchMatMulV2:CPU::MatMul takes 5.777us
2022-06-16 02:29:14.286361: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: AddV2:CPU::Add takes 2.502us
2022-06-16 02:29:14.286369: I tensorflow/core/common_runtime/constant_folding.cc:613] No constant foldable nodes found
2022-06-16 02:29:14.286438: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _SOURCE}}'Will fall back to a default kernel.

2022-06-16 02:29:14.286449: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: NoOp:GPU::_SOURCE takes 11.318us
2022-06-16 02:29:14.286458: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _SINK}}'Will fall back to a default kernel.

2022-06-16 02:29:14.286465: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: NoOp:GPU::_SINK takes 6.905us
2022-06-16 02:29:14.286476: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node random_uniform/shape}}'Will fall back to a default kernel.

2022-06-16 02:29:14.286482: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Const:GPU::random_uniform/shape takes 10.405us
2022-06-16 02:29:14.286493: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: RandomUniform:GPU::random_uniform/RandomUniform takes 1.416us
2022-06-16 02:29:14.286503: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: RandomUniform:GPU::random_uniform_1/RandomUniform takes 0.86us
2022-06-16 02:29:14.286511: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _arg_Placeholder_0_0/_1}}'Will fall back to a default kernel.

2022-06-16 02:29:14.286517: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Recv:GPU::_arg_Placeholder_0_0/_1 takes 6.053us
2022-06-16 02:29:14.286525: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: BatchMatMulV2:GPU::MatMul takes 1.442us
2022-06-16 02:29:14.286537: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: AddV2:GPU::Add takes 3.399us
2022-06-16 02:29:14.286545: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node Add/_2}}'Will fall back to a default kernel.

2022-06-16 02:29:14.286550: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Send:GPU::Add/_2 takes 5.581us
2022-06-16 02:29:14.286558: I tensorflow/core/common_runtime/memory_types.cc:87] 2:0 -> 3:0: 1 -> 1
2022-06-16 02:29:14.286564: I tensorflow/core/common_runtime/memory_types.cc:87] 2:0 -> 4:0: 1 -> 1
2022-06-16 02:29:14.286569: I tensorflow/core/common_runtime/memory_types.cc:87] 3:0 -> 6:0: 0 -> 0
2022-06-16 02:29:14.286574: I tensorflow/core/common_runtime/memory_types.cc:87] 5:0 -> 6:1: 0 -> 0
2022-06-16 02:29:14.286578: I tensorflow/core/common_runtime/memory_types.cc:87] 6:0 -> 7:0: 0 -> 0
2022-06-16 02:29:14.286583: I tensorflow/core/common_runtime/memory_types.cc:87] 4:0 -> 7:1: 0 -> 0
2022-06-16 02:29:14.286589: I tensorflow/core/common_runtime/memory_types.cc:87] 7:0 -> 8:0: 0 -> 0
2022-06-16 02:29:14.286596: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _SOURCE}}'Will fall back to a default kernel.

2022-06-16 02:29:14.286601: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: NoOp:GPU::_SOURCE takes 5.44us
2022-06-16 02:29:14.286607: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _SINK}}'Will fall back to a default kernel.

2022-06-16 02:29:14.286613: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: NoOp:GPU::_SINK takes 5.229us
2022-06-16 02:29:14.286621: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node random_uniform/shape}}'Will fall back to a default kernel.

2022-06-16 02:29:14.286627: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Const:GPU::random_uniform/shape takes 7.897us
2022-06-16 02:29:14.286635: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: RandomUniform:GPU::random_uniform/RandomUniform takes 0.774us
2022-06-16 02:29:14.286643: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: RandomUniform:GPU::random_uniform_1/RandomUniform takes 0.826us
2022-06-16 02:29:14.286650: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _arg_Placeholder_0_0/_1}}'Will fall back to a default kernel.

2022-06-16 02:29:14.286656: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Recv:GPU::_arg_Placeholder_0_0/_1 takes 5.725us
2022-06-16 02:29:14.286664: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: BatchMatMulV2:GPU::MatMul takes 0.789us
2022-06-16 02:29:14.286672: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: AddV2:GPU::Add takes 1.056us
2022-06-16 02:29:14.286679: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node Add/_2}}'Will fall back to a default kernel.

2022-06-16 02:29:14.286684: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Send:GPU::Add/_2 takes 5.257us
2022-06-16 02:29:14.286691: I tensorflow/core/common_runtime/memory_types.cc:87] 2:0 -> 3:0: 1 -> 1
2022-06-16 02:29:14.286696: I tensorflow/core/common_runtime/memory_types.cc:87] 2:0 -> 4:0: 1 -> 1
2022-06-16 02:29:14.286701: I tensorflow/core/common_runtime/memory_types.cc:87] 3:0 -> 6:0: 0 -> 0
2022-06-16 02:29:14.286706: I tensorflow/core/common_runtime/memory_types.cc:87] 5:0 -> 6:1: 0 -> 0
2022-06-16 02:29:14.286711: I tensorflow/core/common_runtime/memory_types.cc:87] 6:0 -> 7:0: 0 -> 0
2022-06-16 02:29:14.286716: I tensorflow/core/common_runtime/memory_types.cc:87] 4:0 -> 7:1: 0 -> 0
2022-06-16 02:29:14.286720: I tensorflow/core/common_runtime/memory_types.cc:87] 7:0 -> 8:0: 0 -> 0
2022-06-16 02:29:14.286825: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node _SOURCE}} = NoOp[]()
2022-06-16 02:29:14.286837: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _SOURCE}}'Will fall back to a default kernel.

2022-06-16 02:29:14.286843: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: NoOp:GPU::_SOURCE takes 6.343us
2022-06-16 02:29:14.286849: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _SOURCE}}'Will fall back to a default kernel.

2022-06-16 02:29:14.286854: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: NoOp:GPU::_SOURCE takes 5.019us
2022-06-16 02:29:14.286870: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node _SOURCE}} = NoOp[]() takes 46.275us

2022-06-16 02:29:14.286905: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node random_uniform/shape}} = Const[_XlaHasReferenceVars=false, dtype=DT_INT32, value=Tensor<type: int32 shape: [2] values: 1024 128>, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()
2022-06-16 02:29:14.286919: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node random_uniform/shape}}'Will fall back to a default kernel.

2022-06-16 02:29:14.286925: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Const:GPU::random_uniform/shape takes 8.587us
2022-06-16 02:29:14.286934: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node random_uniform/shape}}'Will fall back to a default kernel.

2022-06-16 02:29:14.286939: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: Const:GPU::random_uniform/shape takes 7.849us
2022-06-16 02:29:14.288588: I tensorflow/stream_executor/stream_executor_pimpl.cc:581] Called StreamExecutor::HostMemoryAllocate(size=2097152) returns 0x7ff8b1400000
2022-06-16 02:29:14.288613: I tensorflow/core/common_runtime/bfc_allocator.cc:157] Extending allocation by 2.00MiB bytes for gpu_host_bfc.
2022-06-16 02:29:14.288619: I tensorflow/core/common_runtime/bfc_allocator.cc:162] Total allocated bytes: 2.00MiB
2022-06-16 02:29:14.288624: I tensorflow/core/common_runtime/bfc_allocator.cc:165] Allocated memory at 0x7ff8b1400000 to 0x7ff8b1600000
2022-06-16 02:29:14.288675: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node random_uniform/shape}} = Const[_XlaHasReferenceVars=false, dtype=DT_INT32, value=Tensor<type: int32 shape: [2] values: 1024 128>, _device="/job:localhost/replica:0/task:0/device:GPU:0"]() takes 1773.54us

2022-06-16 02:29:14.288700: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node random_uniform/RandomUniform}} = RandomUniform[T=DT_INT32, _XlaHasReferenceVars=false, dtype=DT_FLOAT, seed=0, seed2=0, _device="/job:localhost/replica:0/task:0/device:GPU:0"](random_uniform/shape)
2022-06-16 02:29:14.288717: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: RandomUniform:GPU::random_uniform/RandomUniform takes 2.157us
2022-06-16 02:29:14.288724: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: RandomUniform:GPU::random_uniform/RandomUniform takes 0.717us
2022-06-16 02:29:14.288758: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node random_uniform/RandomUniform}} = RandomUniform[T=DT_INT32, _XlaHasReferenceVars=false, dtype=DT_FLOAT, seed=0, seed2=0, _device="/job:localhost/replica:0/task:0/device:GPU:0"](random_uniform/shape) takes 59.679us

2022-06-16 02:29:14.288773: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node random_uniform_1/RandomUniform}} = RandomUniform[T=DT_INT32, _XlaHasReferenceVars=false, dtype=DT_FLOAT, seed=0, seed2=0, _device="/job:localhost/replica:0/task:0/device:GPU:0"](random_uniform/shape)
2022-06-16 02:29:14.288782: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: RandomUniform:GPU::random_uniform_1/RandomUniform takes 0.863us
2022-06-16 02:29:14.288788: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: RandomUniform:GPU::random_uniform_1/RandomUniform takes 0.569us
2022-06-16 02:29:14.288797: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node random_uniform_1/RandomUniform}} = RandomUniform[T=DT_INT32, _XlaHasReferenceVars=false, dtype=DT_FLOAT, seed=0, seed2=0, _device="/job:localhost/replica:0/task:0/device:GPU:0"](random_uniform/shape) takes 24.786us

2022-06-16 02:29:14.288814: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node _arg_Placeholder_0_0/_1}} = _Recv[_dst="MatMul", _src="_arg_Placeholder_0_0", client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_11__arg_Placeholder_0_0", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()
2022-06-16 02:29:14.288830: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _arg_Placeholder_0_0/_1}}'Will fall back to a default kernel.

2022-06-16 02:29:14.288835: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Recv:GPU::_arg_Placeholder_0_0/_1 takes 7.041us
2022-06-16 02:29:14.288841: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _arg_Placeholder_0_0/_1}}'Will fall back to a default kernel.

2022-06-16 02:29:14.288846: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Recv:GPU::_arg_Placeholder_0_0/_1 takes 5.254us
2022-06-16 02:29:14.288879: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node _arg_Placeholder_0_0/_1}} = _Recv[_dst="MatMul", _src="_arg_Placeholder_0_0", client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_11__arg_Placeholder_0_0", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]() takes 65.945us

2022-06-16 02:29:14.288892: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node MatMul}} = BatchMatMulV2[T=DT_FLOAT, _XlaHasReferenceVars=false, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](random_uniform/RandomUniform, _arg_Placeholder_0_0/_1)
2022-06-16 02:29:14.288901: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: BatchMatMulV2:GPU::MatMul takes 1.541us
2022-06-16 02:29:14.288908: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: BatchMatMulV2:GPU::MatMul takes 0.633us
2022-06-16 02:29:14.288926: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node MatMul}} = BatchMatMulV2[T=DT_FLOAT, _XlaHasReferenceVars=false, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](random_uniform/RandomUniform, _arg_Placeholder_0_0/_1) takes 34.362us

2022-06-16 02:29:14.288939: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node Add}} = AddV2[T=DT_FLOAT, _XlaHasReferenceVars=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](MatMul, random_uniform_1/RandomUniform)
2022-06-16 02:29:14.288948: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: AddV2:GPU::Add takes 1.775us
2022-06-16 02:29:14.288955: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: AddV2:GPU::Add takes 0.909us
2022-06-16 02:29:14.288969: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node Add}} = AddV2[T=DT_FLOAT, _XlaHasReferenceVars=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](MatMul, random_uniform_1/RandomUniform) takes 29.579us

2022-06-16 02:29:14.288982: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node Add/_2}} = _Send[T=DT_FLOAT, _dst="_retval_Add_0_0", _src="Add", client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_12_Add", _device="/job:localhost/replica:0/task:0/device:GPU:0"](Add)
2022-06-16 02:29:14.288996: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node Add/_2}}'Will fall back to a default kernel.

2022-06-16 02:29:14.289001: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Send:GPU::Add/_2 takes 5.9us
2022-06-16 02:29:14.289006: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node Add/_2}}'Will fall back to a default kernel.

2022-06-16 02:29:14.289012: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Send:GPU::Add/_2 takes 5.055us
2022-06-16 02:29:14.289028: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node Add/_2}} = _Send[T=DT_FLOAT, _dst="_retval_Add_0_0", _src="Add", client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_12_Add", _device="/job:localhost/replica:0/task:0/device:GPU:0"](Add) takes 47.598us

2022-06-16 02:29:14.289067: I tensorflow/core/common_runtime/constant_folding.cc:613] No constant foldable nodes found
2022-06-16 02:29:14.289151: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node _SOURCE}} = NoOp[]()
2022-06-16 02:29:14.289161: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _SOURCE}}'Will fall back to a default kernel.

2022-06-16 02:29:14.289167: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: NoOp:CPU::_SOURCE takes 6.069us
2022-06-16 02:29:14.289172: I tensorflow/core/framework/op_kernel.cc:1360] No device-specific kernels found for NodeDef '{{node _SOURCE}}'Will fall back to a default kernel.

2022-06-16 02:29:14.289177: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: NoOp:CPU::_SOURCE takes 4.854us
2022-06-16 02:29:14.289186: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node _SOURCE}} = NoOp[]() takes 34.209us

2022-06-16 02:29:14.289195: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node _arg_Placeholder_0_0}} = _Arg[T=DT_FLOAT, _XlaHasReferenceVars=false, index=0, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()
2022-06-16 02:29:14.289203: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:CPU::_arg_Placeholder_0_0 takes 0.538us
2022-06-16 02:29:14.289209: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Arg:CPU::_arg_Placeholder_0_0 takes 0.302us
2022-06-16 02:29:14.289220: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node _arg_Placeholder_0_0}} = _Arg[T=DT_FLOAT, _XlaHasReferenceVars=false, index=0, _device="/job:localhost/replica:0/task:0/device:CPU:0"]() takes 24.914us

2022-06-16 02:29:14.289233: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node _arg_Placeholder_0_0/_0}} = _Send[T=DT_FLOAT, _dst="MatMul", _src="_arg_Placeholder_0_0", client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_11__arg_Placeholder_0_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_Placeholder_0_0)
2022-06-16 02:29:14.289242: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Send:CPU::_arg_Placeholder_0_0/_0 takes 0.426us
2022-06-16 02:29:14.289248: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Send:CPU::_arg_Placeholder_0_0/_0 takes 0.259us
2022-06-16 02:29:14.289264: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node _arg_Placeholder_0_0/_0}} = _Send[T=DT_FLOAT, _dst="MatMul", _src="_arg_Placeholder_0_0", client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_11__arg_Placeholder_0_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_Placeholder_0_0) takes 32.493us

2022-06-16 02:29:14.289276: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node Add/_3}} = _Recv[_dst="_retval_Add_0_0", _src="Add", client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_12_Add", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()
2022-06-16 02:29:14.289286: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Recv:CPU::Add/_3 takes 0.541us
2022-06-16 02:29:14.289291: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Recv:CPU::Add/_3 takes 0.274us
2022-06-16 02:29:14.289307: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node Add/_3}} = _Recv[_dst="_retval_Add_0_0", _src="Add", client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_12_Add", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]() takes 30.472us

2022-06-16 02:29:14.289319: I tensorflow/core/framework/op_kernel.cc:1616] Instantiating kernel for node: {{node _retval_Add_0_0}} = _Retval[T=DT_FLOAT, _XlaHasReferenceVars=false, index=0, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Add/_3)
2022-06-16 02:29:14.289328: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:CPU::_retval_Add_0_0 takes 0.383us
2022-06-16 02:29:14.289333: I tensorflow/core/framework/op_kernel.cc:1370] Find Kernel Registration for node: _Retval:CPU::_retval_Add_0_0 takes 0.268us
2022-06-16 02:29:14.289343: I tensorflow/core/framework/op_kernel.cc:1665] Instantiating kernel for node: {{node _retval_Add_0_0}} = _Retval[T=DT_FLOAT, _XlaHasReferenceVars=false, index=0, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Add/_3) takes 23.498us

# run 1 compute start
2022-06-16 02:29:14.289511: I tensorflow/core/common_runtime/executor.cc:783] Process node: 0 step 1 {{node _SOURCE}} = NoOp[]() device: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-16 02:29:14.289612: I tensorflow/core/common_runtime/executor.cc:783] Process node: 0 step 1 {{node _SOURCE}} = NoOp[]() device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:14.289701: I tensorflow/core/common_runtime/executor.cc:783] Process node: 2 step 1 {{node _arg_Placeholder_0_0}} = _Arg[T=DT_FLOAT, _XlaHasReferenceVars=false, index=0, _device="/job:localhost/replica:0/task:0/device:CPU:0"]() device: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-16 02:29:14.289737: I tensorflow/core/common_runtime/executor.cc:783] Process node: 4 step 1 {{node Add/_3}} = _Recv[_dst="_retval_Add_0_0", _src="Add", client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_12_Add", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]() device: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-16 02:29:14.289818: I tensorflow/core/common_runtime/executor.cc:783] Process node: 2 step 1 {{node random_uniform/shape}} = Const[_XlaHasReferenceVars=false, dtype=DT_INT32, value=Tensor<type: int32 shape: [2] values: 1024 128>, _device="/job:localhost/replica:0/task:0/device:GPU:0"]() device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:14.289852: I tensorflow/core/common_runtime/executor.cc:783] Process node: 3 step 1 {{node _arg_Placeholder_0_0/_0}} = _Send[T=DT_FLOAT, _dst="MatMul", _src="_arg_Placeholder_0_0", client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_11__arg_Placeholder_0_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_Placeholder_0_0) device: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-16 02:29:14.289877: I tensorflow/core/common_runtime/executor.cc:783] Process node: 5 step 1 {{node _arg_Placeholder_0_0/_1}} = _Recv[_dst="MatMul", _src="_arg_Placeholder_0_0", client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_11__arg_Placeholder_0_0", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]() device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:14.289903: I tensorflow/core/common_runtime/rendezvous_mgr.cc:174] IntraProcessRendezvous Recv 0x1dbf8b0 /job:localhost/replica:0/task:0/device:GPU:0;0000000000000001;/job:localhost/replica:0/task:0/device:CPU:0;edge_12_Add;0:0
2022-06-16 02:29:14.289945: I tensorflow/core/common_runtime/rendezvous_mgr.cc:167] IntraProcessRendezvous Send 0x1dbf8b0 /job:localhost/replica:0/task:0/device:CPU:0;0000000000000001;/job:localhost/replica:0/task:0/device:GPU:0;edge_11__arg_Placeholder_0_0;0:0
2022-06-16 02:29:14.289975: I tensorflow/core/common_runtime/rendezvous_mgr.cc:125] IntraProcessRendezvous Recv 0x1dbf8d0 /job:localhost/replica:0/task:0/device:GPU:0;0000000000000001;/job:localhost/replica:0/task:0/device:CPU:0;edge_12_Add;0:0
2022-06-16 02:29:14.314493: I tensorflow/stream_executor/stream_executor_pimpl.cc:534] Called StreamExecutor::Allocate(size=10126688256, memory_space=0) returns 0x7ff5a4000000
2022-06-16 02:29:14.314531: I tensorflow/core/common_runtime/bfc_allocator.cc:157] Extending allocation by 9.43GiB bytes for GPU_0_bfc.
2022-06-16 02:29:14.314542: I tensorflow/core/common_runtime/bfc_allocator.cc:162] Total allocated bytes: 9.43GiB
2022-06-16 02:29:14.314551: I tensorflow/core/common_runtime/bfc_allocator.cc:165] Allocated memory at 0x7ff5a4000000 to 0x7ff7ff990000
2022-06-16 02:29:14.507828: I tensorflow/stream_executor/stream_executor_pimpl.cc:623] Called StreamExecutor::SynchronousMemZero(location=0x7ff87e7fb060, size=1028)
2022-06-16 02:29:14.508299: I tensorflow/core/common_runtime/gpu/gpu_device.cc:753] GpuDevice::ComputeAsync _arg_Placeholder_0_0/_1 op _Recv on GPU0 stream[0]
2022-06-16 02:29:14.508321: I tensorflow/core/common_runtime/rendezvous_mgr.cc:174] IntraProcessRendezvous Recv 0x1dbf8b0 /job:localhost/replica:0/task:0/device:CPU:0;0000000000000001;/job:localhost/replica:0/task:0/device:GPU:0;edge_11__arg_Placeholder_0_0;0:0
2022-06-16 02:29:14.508327: I tensorflow/core/common_runtime/rendezvous_mgr.cc:125] IntraProcessRendezvous Recv 0x1dbf8d0 /job:localhost/replica:0/task:0/device:CPU:0;0000000000000001;/job:localhost/replica:0/task:0/device:GPU:0;edge_11__arg_Placeholder_0_0;0:0
2022-06-16 02:29:14.508353: I tensorflow/core/common_runtime/copy_tensor.cc:211] Copy edge_11__arg_Placeholder_0_0
2022-06-16 02:29:14.508367: I tensorflow/core/common_runtime/gpu/gpu_util.cc:315] CopyCPUTensorToGPU
2022-06-16 02:29:14.508384: I tensorflow/stream_executor/stream.cc:1052] [stream=0x21bc8d30,impl=0x5d08860] Called Stream::ThenWaitFor(other=0x21d0ed00)
2022-06-16 02:29:14.508413: I tensorflow/stream_executor/stream.cc:3887] [stream=0x21bc8d30,impl=0x5d08860] Called Stream::ThenMemcpy(gpu_dst=0x7ff5a4000500, host_src=0x211a3840, size=65536)
2022-06-16 02:29:14.508503: I tensorflow/stream_executor/stream.cc:340] [stream=0x21bc8d30,impl=0x5d08860] Called Stream::ThenRecordEvent(event=0x7ff80c00a770)
2022-06-16 02:29:14.508573: I tensorflow/core/common_runtime/executor.cc:783] Process node: 3 step 1 {{node random_uniform/RandomUniform}} = RandomUniform[T=DT_INT32, _XlaHasReferenceVars=false, dtype=DT_FLOAT, seed=0, seed2=0, _device="/job:localhost/replica:0/task:0/device:GPU:0"](random_uniform/shape) device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:14.508591: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper random_uniform/RandomUniform op RandomUniform on GPU 0 stream[0]
2022-06-16 02:29:14.508605: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] random_uniform/RandomUniform:RandomUniform#shape=(int32[2])#
2022-06-16 02:29:14.508680: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled random_uniform/RandomUniform op RandomUniform on GPU 0 stream[0]
2022-06-16 02:29:14.508702: I tensorflow/core/common_runtime/executor.cc:783] Process node: 4 step 1 {{node random_uniform_1/RandomUniform}} = RandomUniform[T=DT_INT32, _XlaHasReferenceVars=false, dtype=DT_FLOAT, seed=0, seed2=0, _device="/job:localhost/replica:0/task:0/device:GPU:0"](random_uniform/shape) device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:14.508711: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper random_uniform_1/RandomUniform op RandomUniform on GPU 0 stream[0]
2022-06-16 02:29:14.508716: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] random_uniform_1/RandomUniform:RandomUniform#shape=(int32[2])#
2022-06-16 02:29:14.508730: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled random_uniform_1/RandomUniform op RandomUniform on GPU 0 stream[0]
2022-06-16 02:29:14.508741: I tensorflow/core/common_runtime/executor.cc:783] Process node: 6 step 1 {{node MatMul}} = BatchMatMulV2[T=DT_FLOAT, _XlaHasReferenceVars=false, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](random_uniform/RandomUniform, _arg_Placeholder_0_0/_1) device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:14.508747: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper MatMul op BatchMatMulV2 on GPU 0 stream[0]
2022-06-16 02:29:14.508754: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] MatMul:BatchMatMulV2#shape=(float[1024,128];float[1,128,128])#
2022-06-16 02:29:14.508893: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2022-06-16 02:29:15.100994: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2022-06-16 02:29:15.101097: I tensorflow/stream_executor/cuda/cuda_blas.cc:1821] doing cuBLAS SGEMM: at=0 bt=0 m=128 n=1024 k=128 alpha=0x7ff87e7fad00 a=0x7ff5a4000500 lda=128 b=0x7ff5a4010500 ldb=128 beta=0x7ff87e7fad10 c=0x7ff5a4110500 ldc=128
2022-06-16 02:29:15.101832: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled MatMul op BatchMatMulV2 on GPU 0 stream[0]
2022-06-16 02:29:15.101902: I tensorflow/core/common_runtime/executor.cc:783] Process node: 7 step 1 {{node Add}} = AddV2[T=DT_FLOAT, _XlaHasReferenceVars=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](MatMul, random_uniform_1/RandomUniform) device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:15.101935: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper Add op AddV2 on GPU 0 stream[0]
2022-06-16 02:29:15.101950: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] Add:AddV2#shape=(float[1,1024,128];float[1024,128])#
2022-06-16 02:29:15.102173: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled Add op AddV2 on GPU 0 stream[0]
2022-06-16 02:29:15.102204: I tensorflow/core/common_runtime/executor.cc:783] Process node: 8 step 1 {{node Add/_2}} = _Send[T=DT_FLOAT, _dst="_retval_Add_0_0", _src="Add", client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_12_Add", _device="/job:localhost/replica:0/task:0/device:GPU:0"](Add) device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:15.102215: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper Add/_2 op _Send on GPU 0 stream[0]
2022-06-16 02:29:15.102223: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] Add/_2:_Send#from=Add,to=_retval_Add_0_0#
2022-06-16 02:29:15.102233: I tensorflow/core/common_runtime/rendezvous_mgr.cc:167] IntraProcessRendezvous Send 0x1dbf8b0 /job:localhost/replica:0/task:0/device:GPU:0;0000000000000001;/job:localhost/replica:0/task:0/device:CPU:0;edge_12_Add;0:0
2022-06-16 02:29:15.102261: I tensorflow/core/common_runtime/copy_tensor.cc:211] Copy edge_12_Add
2022-06-16 02:29:15.102272: I tensorflow/core/common_runtime/gpu/gpu_util.cc:270] CopyGPUTensorToCPU
2022-06-16 02:29:15.102285: I tensorflow/stream_executor/stream.cc:1052] [stream=0x21bc9000,impl=0x21d0d1d0] Called Stream::ThenWaitFor(other=0x21d0ed00)
2022-06-16 02:29:15.102306: I tensorflow/stream_executor/stream.cc:3879] [stream=0x21bc9000,impl=0x21d0d1d0] Called Stream::ThenMemcpy(host_dst=0x7ff8b1400100, gpu_src=0x7ff5a4110500, size=524288)
2022-06-16 02:29:15.102339: I tensorflow/stream_executor/stream.cc:340] [stream=0x21bc9000,impl=0x21d0d1d0] Called Stream::ThenRecordEvent(event=0x7ff80c00a770)
2022-06-16 02:29:15.102369: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled Add/_2 op _Send on GPU 0 stream[0]
2022-06-16 02:29:15.102387: I tensorflow/stream_executor/stream.cc:4366] [stream=0x21d0ed00,impl=0x21d0d8d0] Called Stream::BlockHostUntilDone()
2022-06-16 02:29:15.102398: I tensorflow/stream_executor/temporary_memory_manager.cc:64] deallocated 0 finalized temporaries
2022-06-16 02:29:15.102596: I tensorflow/core/common_runtime/executor.cc:783] Process node: 5 step 1 {{node _retval_Add_0_0}} = _Retval[T=DT_FLOAT, _XlaHasReferenceVars=false, index=0, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Add/_3) device: /job:localhost/replica:0/task:0/device:CPU:0
# run 1 compute end
# run 2 compute start
2022-06-16 02:29:15.104302: I tensorflow/core/common_runtime/executor.cc:783] Process node: 0 step 2 {{node _SOURCE}} = NoOp[]() device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:15.104366: I tensorflow/core/common_runtime/executor.cc:783] Process node: 0 step 2 {{node _SOURCE}} = NoOp[]() device: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-16 02:29:15.104416: I tensorflow/core/common_runtime/executor.cc:783] Process node: 2 step 2 {{node _arg_Placeholder_0_0}} = _Arg[T=DT_FLOAT, _XlaHasReferenceVars=false, index=0, _device="/job:localhost/replica:0/task:0/device:CPU:0"]() device: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-16 02:29:15.104426: I tensorflow/core/common_runtime/executor.cc:783] Process node: 2 step 2 {{node random_uniform/shape}} = Const[_XlaHasReferenceVars=false, dtype=DT_INT32, value=Tensor<type: int32 shape: [2] values: 1024 128>, _device="/job:localhost/replica:0/task:0/device:GPU:0"]() device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:15.104452: I tensorflow/core/common_runtime/executor.cc:783] Process node: 3 step 2 {{node _arg_Placeholder_0_0/_0}} = _Send[T=DT_FLOAT, _dst="MatMul", _src="_arg_Placeholder_0_0", client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_11__arg_Placeholder_0_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_Placeholder_0_0) device: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-16 02:29:15.104463: I tensorflow/core/common_runtime/rendezvous_mgr.cc:167] IntraProcessRendezvous Send 0x1dbf8b0 /job:localhost/replica:0/task:0/device:CPU:0;0000000000000001;/job:localhost/replica:0/task:0/device:GPU:0;edge_11__arg_Placeholder_0_0;0:0
2022-06-16 02:29:15.104483: I tensorflow/core/common_runtime/executor.cc:783] Process node: 5 step 2 {{node _arg_Placeholder_0_0/_1}} = _Recv[_dst="MatMul", _src="_arg_Placeholder_0_0", client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_11__arg_Placeholder_0_0", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]() device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:15.104528: I tensorflow/core/common_runtime/executor.cc:783] Process node: 4 step 2 {{node Add/_3}} = _Recv[_dst="_retval_Add_0_0", _src="Add", client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_12_Add", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]() device: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-16 02:29:15.104592: I tensorflow/core/common_runtime/gpu/gpu_device.cc:753] GpuDevice::ComputeAsync _arg_Placeholder_0_0/_1 op _Recv on GPU0 stream[0]
2022-06-16 02:29:15.104628: I tensorflow/core/common_runtime/rendezvous_mgr.cc:174] IntraProcessRendezvous Recv 0x1dbf8b0 /job:localhost/replica:0/task:0/device:GPU:0;0000000000000001;/job:localhost/replica:0/task:0/device:CPU:0;edge_12_Add;0:0
2022-06-16 02:29:15.104672: I tensorflow/core/common_runtime/rendezvous_mgr.cc:174] IntraProcessRendezvous Recv 0x1dbf8b0 /job:localhost/replica:0/task:0/device:CPU:0;0000000000000001;/job:localhost/replica:0/task:0/device:GPU:0;edge_11__arg_Placeholder_0_0;0:0
2022-06-16 02:29:15.104709: I tensorflow/core/common_runtime/rendezvous_mgr.cc:125] IntraProcessRendezvous Recv 0x1dbf8d0 /job:localhost/replica:0/task:0/device:CPU:0;0000000000000001;/job:localhost/replica:0/task:0/device:GPU:0;edge_11__arg_Placeholder_0_0;0:0
2022-06-16 02:29:15.104735: I tensorflow/core/common_runtime/rendezvous_mgr.cc:125] IntraProcessRendezvous Recv 0x1dbf8d0 /job:localhost/replica:0/task:0/device:GPU:0;0000000000000001;/job:localhost/replica:0/task:0/device:CPU:0;edge_12_Add;0:0
2022-06-16 02:29:15.104771: I tensorflow/core/common_runtime/copy_tensor.cc:211] Copy edge_11__arg_Placeholder_0_0
2022-06-16 02:29:15.104803: I tensorflow/core/common_runtime/gpu/gpu_util.cc:315] CopyCPUTensorToGPU
2022-06-16 02:29:15.104831: I tensorflow/stream_executor/stream.cc:1052] [stream=0x21bc8d30,impl=0x5d08860] Called Stream::ThenWaitFor(other=0x21d0ed00)
2022-06-16 02:29:15.104885: I tensorflow/stream_executor/stream.cc:3887] [stream=0x21bc8d30,impl=0x5d08860] Called Stream::ThenMemcpy(gpu_dst=0x7ff5a4000500, host_src=0x211a3840, size=65536)
2022-06-16 02:29:15.104984: I tensorflow/stream_executor/stream.cc:340] [stream=0x21bc8d30,impl=0x5d08860] Called Stream::ThenRecordEvent(event=0x7ff80c00a770)
2022-06-16 02:29:15.105071: I tensorflow/core/common_runtime/executor.cc:783] Process node: 3 step 2 {{node random_uniform/RandomUniform}} = RandomUniform[T=DT_INT32, _XlaHasReferenceVars=false, dtype=DT_FLOAT, seed=0, seed2=0, _device="/job:localhost/replica:0/task:0/device:GPU:0"](random_uniform/shape) device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:15.105106: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper random_uniform/RandomUniform op RandomUniform on GPU 0 stream[0]
2022-06-16 02:29:15.105139: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] random_uniform/RandomUniform:RandomUniform#shape=(int32[2])#
2022-06-16 02:29:15.105226: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled random_uniform/RandomUniform op RandomUniform on GPU 0 stream[0]
2022-06-16 02:29:15.105274: I tensorflow/core/common_runtime/executor.cc:783] Process node: 4 step 2 {{node random_uniform_1/RandomUniform}} = RandomUniform[T=DT_INT32, _XlaHasReferenceVars=false, dtype=DT_FLOAT, seed=0, seed2=0, _device="/job:localhost/replica:0/task:0/device:GPU:0"](random_uniform/shape) device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:15.105295: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper random_uniform_1/RandomUniform op RandomUniform on GPU 0 stream[0]
2022-06-16 02:29:15.105316: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] random_uniform_1/RandomUniform:RandomUniform#shape=(int32[2])#
2022-06-16 02:29:15.105360: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled random_uniform_1/RandomUniform op RandomUniform on GPU 0 stream[0]
2022-06-16 02:29:15.105398: I tensorflow/core/common_runtime/executor.cc:783] Process node: 6 step 2 {{node MatMul}} = BatchMatMulV2[T=DT_FLOAT, _XlaHasReferenceVars=false, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](random_uniform/RandomUniform, _arg_Placeholder_0_0/_1) device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:15.105419: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper MatMul op BatchMatMulV2 on GPU 0 stream[0]
2022-06-16 02:29:15.105451: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] MatMul:BatchMatMulV2#shape=(float[1024,128];float[1,128,128])#
2022-06-16 02:29:15.105508: I tensorflow/stream_executor/cuda/cuda_blas.cc:1821] doing cuBLAS SGEMM: at=0 bt=0 m=128 n=1024 k=128 alpha=0x7ff87effbd00 a=0x7ff5a4000500 lda=128 b=0x7ff5a4010500 ldb=128 beta=0x7ff87effbd10 c=0x7ff5a4110500 ldc=128
2022-06-16 02:29:15.105661: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled MatMul op BatchMatMulV2 on GPU 0 stream[0]
2022-06-16 02:29:15.105709: I tensorflow/core/common_runtime/executor.cc:783] Process node: 7 step 2 {{node Add}} = AddV2[T=DT_FLOAT, _XlaHasReferenceVars=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](MatMul, random_uniform_1/RandomUniform) device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:15.105731: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper Add op AddV2 on GPU 0 stream[0]
2022-06-16 02:29:15.105756: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] Add:AddV2#shape=(float[1,1024,128];float[1024,128])#
2022-06-16 02:29:15.105813: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled Add op AddV2 on GPU 0 stream[0]
2022-06-16 02:29:15.105878: I tensorflow/core/common_runtime/executor.cc:783] Process node: 8 step 2 {{node Add/_2}} = _Send[T=DT_FLOAT, _dst="_retval_Add_0_0", _src="Add", client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_12_Add", _device="/job:localhost/replica:0/task:0/device:GPU:0"](Add) device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:15.105906: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper Add/_2 op _Send on GPU 0 stream[0]
2022-06-16 02:29:15.105933: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] Add/_2:_Send#from=Add,to=_retval_Add_0_0#
2022-06-16 02:29:15.105960: I tensorflow/core/common_runtime/rendezvous_mgr.cc:167] IntraProcessRendezvous Send 0x1dbf8b0 /job:localhost/replica:0/task:0/device:GPU:0;0000000000000001;/job:localhost/replica:0/task:0/device:CPU:0;edge_12_Add;0:0
2022-06-16 02:29:15.105999: I tensorflow/core/common_runtime/copy_tensor.cc:211] Copy edge_12_Add
2022-06-16 02:29:15.106021: I tensorflow/core/common_runtime/gpu/gpu_util.cc:270] CopyGPUTensorToCPU
2022-06-16 02:29:15.106047: I tensorflow/stream_executor/stream.cc:1052] [stream=0x21bc9000,impl=0x21d0d1d0] Called Stream::ThenWaitFor(other=0x21d0ed00)
2022-06-16 02:29:15.106092: I tensorflow/stream_executor/stream.cc:3879] [stream=0x21bc9000,impl=0x21d0d1d0] Called Stream::ThenMemcpy(host_dst=0x7ff8b1400100, gpu_src=0x7ff5a4110500, size=524288)
2022-06-16 02:29:15.106140: I tensorflow/stream_executor/stream.cc:340] [stream=0x21bc9000,impl=0x21d0d1d0] Called Stream::ThenRecordEvent(event=0x7ff80c00a770)
2022-06-16 02:29:15.106178: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled Add/_2 op _Send on GPU 0 stream[0]
2022-06-16 02:29:15.106218: I tensorflow/stream_executor/stream.cc:4366] [stream=0x21d0ed00,impl=0x21d0d8d0] Called Stream::BlockHostUntilDone()
2022-06-16 02:29:15.106242: I tensorflow/stream_executor/temporary_memory_manager.cc:64] deallocated 0 finalized temporaries
2022-06-16 02:29:15.106303: I tensorflow/core/common_runtime/executor.cc:783] Process node: 5 step 2 {{node _retval_Add_0_0}} = _Retval[T=DT_FLOAT, _XlaHasReferenceVars=false, index=0, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Add/_3) device: /job:localhost/replica:0/task:0/device:CPU:0
# run 2 compute end
# run 3 compute start
2022-06-16 02:29:15.112003: I tensorflow/core/common_runtime/executor.cc:783] Process node: 0 step 3 {{node _SOURCE}} = NoOp[]() device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:15.112041: I tensorflow/core/common_runtime/executor.cc:783] Process node: 0 step 3 {{node _SOURCE}} = NoOp[]() device: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-16 02:29:15.112069: I tensorflow/core/common_runtime/executor.cc:783] Process node: 2 step 3 {{node random_uniform/shape}} = Const[_XlaHasReferenceVars=false, dtype=DT_INT32, value=Tensor<type: int32 shape: [2] values: 1024 128>, _device="/job:localhost/replica:0/task:0/device:GPU:0"]() device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:15.112093: I tensorflow/core/common_runtime/executor.cc:783] Process node: 2 step 3 {{node _arg_Placeholder_0_0}} = _Arg[T=DT_FLOAT, _XlaHasReferenceVars=false, index=0, _device="/job:localhost/replica:0/task:0/device:CPU:0"]() device: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-16 02:29:15.112112: I tensorflow/core/common_runtime/executor.cc:783] Process node: 5 step 3 {{node _arg_Placeholder_0_0/_1}} = _Recv[_dst="MatMul", _src="_arg_Placeholder_0_0", client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_11__arg_Placeholder_0_0", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]() device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:15.112132: I tensorflow/core/common_runtime/executor.cc:783] Process node: 3 step 3 {{node _arg_Placeholder_0_0/_0}} = _Send[T=DT_FLOAT, _dst="MatMul", _src="_arg_Placeholder_0_0", client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_11__arg_Placeholder_0_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_Placeholder_0_0) device: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-16 02:29:15.112140: I tensorflow/core/common_runtime/gpu/gpu_device.cc:753] GpuDevice::ComputeAsync _arg_Placeholder_0_0/_1 op _Recv on GPU0 stream[0]
2022-06-16 02:29:15.112164: I tensorflow/core/common_runtime/executor.cc:783] Process node: 4 step 3 {{node Add/_3}} = _Recv[_dst="_retval_Add_0_0", _src="Add", client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_12_Add", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]() device: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-16 02:29:15.112204: I tensorflow/core/common_runtime/rendezvous_mgr.cc:167] IntraProcessRendezvous Send 0x1dbf8b0 /job:localhost/replica:0/task:0/device:CPU:0;0000000000000001;/job:localhost/replica:0/task:0/device:GPU:0;edge_11__arg_Placeholder_0_0;0:0
2022-06-16 02:29:15.112213: I tensorflow/core/common_runtime/rendezvous_mgr.cc:174] IntraProcessRendezvous Recv 0x1dbf8b0 /job:localhost/replica:0/task:0/device:GPU:0;0000000000000001;/job:localhost/replica:0/task:0/device:CPU:0;edge_12_Add;0:0
2022-06-16 02:29:15.112243: I tensorflow/core/common_runtime/rendezvous_mgr.cc:174] IntraProcessRendezvous Recv 0x1dbf8b0 /job:localhost/replica:0/task:0/device:CPU:0;0000000000000001;/job:localhost/replica:0/task:0/device:GPU:0;edge_11__arg_Placeholder_0_0;0:0
2022-06-16 02:29:15.112293: I tensorflow/core/common_runtime/rendezvous_mgr.cc:125] IntraProcessRendezvous Recv 0x1dbf8d0 /job:localhost/replica:0/task:0/device:CPU:0;0000000000000001;/job:localhost/replica:0/task:0/device:GPU:0;edge_11__arg_Placeholder_0_0;0:0
2022-06-16 02:29:15.112315: I tensorflow/core/common_runtime/rendezvous_mgr.cc:125] IntraProcessRendezvous Recv 0x1dbf8d0 /job:localhost/replica:0/task:0/device:GPU:0;0000000000000001;/job:localhost/replica:0/task:0/device:CPU:0;edge_12_Add;0:0
2022-06-16 02:29:15.112353: I tensorflow/core/common_runtime/copy_tensor.cc:211] Copy edge_11__arg_Placeholder_0_0
2022-06-16 02:29:15.112386: I tensorflow/core/common_runtime/gpu/gpu_util.cc:315] CopyCPUTensorToGPU
# run 3 H2D end
2022-06-16 02:29:15.112415: I tensorflow/stream_executor/stream.cc:1052] [stream=0x21bc8d30,impl=0x5d08860] Called Stream::ThenWaitFor(other=0x21d0ed00)
2022-06-16 02:29:15.112448: I tensorflow/stream_executor/stream.cc:3887] [stream=0x21bc8d30,impl=0x5d08860] Called Stream::ThenMemcpy(gpu_dst=0x7ff5a4000500, host_src=0x21d46040, size=2097152)
2022-06-16 02:29:15.112950: I tensorflow/stream_executor/stream.cc:340] [stream=0x21bc8d30,impl=0x5d08860] Called Stream::ThenRecordEvent(event=0x7ff80c00a770)
2022-06-16 02:29:15.113027: I tensorflow/core/common_runtime/executor.cc:783] Process node: 3 step 3 {{node random_uniform/RandomUniform}} = RandomUniform[T=DT_INT32, _XlaHasReferenceVars=false, dtype=DT_FLOAT, seed=0, seed2=0, _device="/job:localhost/replica:0/task:0/device:GPU:0"](random_uniform/shape) device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:15.113056: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper random_uniform/RandomUniform op RandomUniform on GPU 0 stream[0]
2022-06-16 02:29:15.113084: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] random_uniform/RandomUniform:RandomUniform#shape=(int32[2])#
2022-06-16 02:29:15.113151: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled random_uniform/RandomUniform op RandomUniform on GPU 0 stream[0]
2022-06-16 02:29:15.113184: I tensorflow/core/common_runtime/executor.cc:783] Process node: 4 step 3 {{node random_uniform_1/RandomUniform}} = RandomUniform[T=DT_INT32, _XlaHasReferenceVars=false, dtype=DT_FLOAT, seed=0, seed2=0, _device="/job:localhost/replica:0/task:0/device:GPU:0"](random_uniform/shape) device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:15.113204: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper random_uniform_1/RandomUniform op RandomUniform on GPU 0 stream[0]
2022-06-16 02:29:15.113230: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] random_uniform_1/RandomUniform:RandomUniform#shape=(int32[2])#
2022-06-16 02:29:15.113276: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled random_uniform_1/RandomUniform op RandomUniform on GPU 0 stream[0]
2022-06-16 02:29:15.113308: I tensorflow/core/common_runtime/executor.cc:783] Process node: 6 step 3 {{node MatMul}} = BatchMatMulV2[T=DT_FLOAT, _XlaHasReferenceVars=false, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](random_uniform/RandomUniform, _arg_Placeholder_0_0/_1) device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:15.113328: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper MatMul op BatchMatMulV2 on GPU 0 stream[0]
2022-06-16 02:29:15.113345: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] MatMul:BatchMatMulV2#shape=(float[1024,128];float[32,128,128])#
2022-06-16 02:29:15.113489: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled MatMul op BatchMatMulV2 on GPU 0 stream[0]
2022-06-16 02:29:15.113530: I tensorflow/core/common_runtime/executor.cc:783] Process node: 7 step 3 {{node Add}} = AddV2[T=DT_FLOAT, _XlaHasReferenceVars=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](MatMul, random_uniform_1/RandomUniform) device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:15.113551: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper Add op AddV2 on GPU 0 stream[0]
2022-06-16 02:29:15.113578: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] Add:AddV2#shape=(float[32,1024,128];float[1024,128])#
2022-06-16 02:29:15.113957: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled Add op AddV2 on GPU 0 stream[0]
2022-06-16 02:29:15.114015: I tensorflow/core/common_runtime/executor.cc:783] Process node: 8 step 3 {{node Add/_2}} = _Send[T=DT_FLOAT, _dst="_retval_Add_0_0", _src="Add", client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_12_Add", _device="/job:localhost/replica:0/task:0/device:GPU:0"](Add) device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:15.114038: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper Add/_2 op _Send on GPU 0 stream[0]
2022-06-16 02:29:15.114059: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] Add/_2:_Send#from=Add,to=_retval_Add_0_0#
2022-06-16 02:29:15.114081: I tensorflow/core/common_runtime/rendezvous_mgr.cc:167] IntraProcessRendezvous Send 0x1dbf8b0 /job:localhost/replica:0/task:0/device:GPU:0;0000000000000001;/job:localhost/replica:0/task:0/device:CPU:0;edge_12_Add;0:0
2022-06-16 02:29:15.129182: I tensorflow/stream_executor/stream_executor_pimpl.cc:581] Called StreamExecutor::HostMemoryAllocate(size=16777216) returns 0x7ff56e000000
2022-06-16 02:29:15.129240: I tensorflow/core/common_runtime/bfc_allocator.cc:157] Extending allocation by 16.00MiB bytes for gpu_host_bfc.
2022-06-16 02:29:15.129252: I tensorflow/core/common_runtime/bfc_allocator.cc:162] Total allocated bytes: 18.00MiB
2022-06-16 02:29:15.129262: I tensorflow/core/common_runtime/bfc_allocator.cc:165] Allocated memory at 0x7ff56e000000 to 0x7ff56f000000
2022-06-16 02:29:15.129780: I tensorflow/core/common_runtime/copy_tensor.cc:211] Copy edge_12_Add
2022-06-16 02:29:15.129801: I tensorflow/core/common_runtime/gpu/gpu_util.cc:270] CopyGPUTensorToCPU
# run 3 D2H end
2022-06-16 02:29:15.129819: I tensorflow/stream_executor/stream.cc:1052] [stream=0x21bc9000,impl=0x21d0d1d0] Called Stream::ThenWaitFor(other=0x21d0ed00)
2022-06-16 02:29:15.129848: I tensorflow/stream_executor/stream.cc:3879] [stream=0x21bc9000,impl=0x21d0d1d0] Called Stream::ThenMemcpy(host_dst=0x7ff56e000000, gpu_src=0x7ff5a4300500, size=16777216)
2022-06-16 02:29:15.129888: I tensorflow/stream_executor/stream.cc:340] [stream=0x21bc9000,impl=0x21d0d1d0] Called Stream::ThenRecordEvent(event=0x7ff80c00a770)
2022-06-16 02:29:15.129925: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled Add/_2 op _Send on GPU 0 stream[0]
2022-06-16 02:29:15.129956: I tensorflow/stream_executor/stream.cc:4366] [stream=0x21d0ed00,impl=0x21d0d8d0] Called Stream::BlockHostUntilDone()
2022-06-16 02:29:15.129976: I tensorflow/stream_executor/temporary_memory_manager.cc:64] deallocated 0 finalized temporaries
2022-06-16 02:29:15.131372: I tensorflow/core/common_runtime/executor.cc:783] Process node: 5 step 3 {{node _retval_Add_0_0}} = _Retval[T=DT_FLOAT, _XlaHasReferenceVars=false, index=0, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Add/_3) device: /job:localhost/replica:0/task:0/device:CPU:0
# run 3 compute end
# run 4 compute start
2022-06-16 02:29:15.133385: I tensorflow/core/common_runtime/executor.cc:783] Process node: 0 step 4 {{node _SOURCE}} = NoOp[]() device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:15.133448: I tensorflow/core/common_runtime/executor.cc:783] Process node: 0 step 4 {{node _SOURCE}} = NoOp[]() device: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-16 02:29:15.133501: I tensorflow/core/common_runtime/executor.cc:783] Process node: 2 step 4 {{node _arg_Placeholder_0_0}} = _Arg[T=DT_FLOAT, _XlaHasReferenceVars=false, index=0, _device="/job:localhost/replica:0/task:0/device:CPU:0"]() device: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-16 02:29:15.133522: I tensorflow/core/common_runtime/executor.cc:783] Process node: 2 step 4 {{node random_uniform/shape}} = Const[_XlaHasReferenceVars=false, dtype=DT_INT32, value=Tensor<type: int32 shape: [2] values: 1024 128>, _device="/job:localhost/replica:0/task:0/device:GPU:0"]() device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:15.133554: I tensorflow/core/common_runtime/executor.cc:783] Process node: 3 step 4 {{node _arg_Placeholder_0_0/_0}} = _Send[T=DT_FLOAT, _dst="MatMul", _src="_arg_Placeholder_0_0", client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_11__arg_Placeholder_0_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_Placeholder_0_0) device: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-16 02:29:15.133571: I tensorflow/core/common_runtime/executor.cc:783] Process node: 4 step 4 {{node Add/_3}} = _Recv[_dst="_retval_Add_0_0", _src="Add", client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_12_Add", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]() device: /job:localhost/replica:0/task:0/device:CPU:0
2022-06-16 02:29:15.133612: I tensorflow/core/common_runtime/rendezvous_mgr.cc:167] IntraProcessRendezvous Send 0x1dbf8b0 /job:localhost/replica:0/task:0/device:CPU:0;0000000000000001;/job:localhost/replica:0/task:0/device:GPU:0;edge_11__arg_Placeholder_0_0;0:0
2022-06-16 02:29:15.133630: I tensorflow/core/common_runtime/executor.cc:783] Process node: 5 step 4 {{node _arg_Placeholder_0_0/_1}} = _Recv[_dst="MatMul", _src="_arg_Placeholder_0_0", client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_11__arg_Placeholder_0_0", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]() device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:15.133661: I tensorflow/core/common_runtime/rendezvous_mgr.cc:174] IntraProcessRendezvous Recv 0x1dbf8b0 /job:localhost/replica:0/task:0/device:GPU:0;0000000000000001;/job:localhost/replica:0/task:0/device:CPU:0;edge_12_Add;0:0
2022-06-16 02:29:15.133682: I tensorflow/core/common_runtime/rendezvous_mgr.cc:125] IntraProcessRendezvous Recv 0x1dbf8d0 /job:localhost/replica:0/task:0/device:GPU:0;0000000000000001;/job:localhost/replica:0/task:0/device:CPU:0;edge_12_Add;0:0
2022-06-16 02:29:15.133698: I tensorflow/core/common_runtime/gpu/gpu_device.cc:753] GpuDevice::ComputeAsync _arg_Placeholder_0_0/_1 op _Recv on GPU0 stream[0]
2022-06-16 02:29:15.133740: I tensorflow/core/common_runtime/rendezvous_mgr.cc:174] IntraProcessRendezvous Recv 0x1dbf8b0 /job:localhost/replica:0/task:0/device:CPU:0;0000000000000001;/job:localhost/replica:0/task:0/device:GPU:0;edge_11__arg_Placeholder_0_0;0:0
2022-06-16 02:29:15.133759: I tensorflow/core/common_runtime/rendezvous_mgr.cc:125] IntraProcessRendezvous Recv 0x1dbf8d0 /job:localhost/replica:0/task:0/device:CPU:0;0000000000000001;/job:localhost/replica:0/task:0/device:GPU:0;edge_11__arg_Placeholder_0_0;0:0
2022-06-16 02:29:15.133791: I tensorflow/core/common_runtime/copy_tensor.cc:211] Copy edge_11__arg_Placeholder_0_0
2022-06-16 02:29:15.133820: I tensorflow/core/common_runtime/gpu/gpu_util.cc:315] CopyCPUTensorToGPU
# run 4 H2D end
2022-06-16 02:29:15.133846: I tensorflow/stream_executor/stream.cc:1052] [stream=0x21bc8d30,impl=0x5d08860] Called Stream::ThenWaitFor(other=0x21d0ed00)
2022-06-16 02:29:15.133891: I tensorflow/stream_executor/stream.cc:3887] [stream=0x21bc8d30,impl=0x5d08860] Called Stream::ThenMemcpy(gpu_dst=0x7ff5a4000500, host_src=0x21d46040, size=2097152)
2022-06-16 02:29:15.134408: I tensorflow/stream_executor/stream.cc:340] [stream=0x21bc8d30,impl=0x5d08860] Called Stream::ThenRecordEvent(event=0x7ff80c00a770)
2022-06-16 02:29:15.134490: I tensorflow/core/common_runtime/executor.cc:783] Process node: 3 step 4 {{node random_uniform/RandomUniform}} = RandomUniform[T=DT_INT32, _XlaHasReferenceVars=false, dtype=DT_FLOAT, seed=0, seed2=0, _device="/job:localhost/replica:0/task:0/device:GPU:0"](random_uniform/shape) device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:15.134521: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper random_uniform/RandomUniform op RandomUniform on GPU 0 stream[0]
2022-06-16 02:29:15.134545: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] random_uniform/RandomUniform:RandomUniform#shape=(int32[2])#
2022-06-16 02:29:15.134640: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled random_uniform/RandomUniform op RandomUniform on GPU 0 stream[0]
2022-06-16 02:29:15.134677: I tensorflow/core/common_runtime/executor.cc:783] Process node: 4 step 4 {{node random_uniform_1/RandomUniform}} = RandomUniform[T=DT_INT32, _XlaHasReferenceVars=false, dtype=DT_FLOAT, seed=0, seed2=0, _device="/job:localhost/replica:0/task:0/device:GPU:0"](random_uniform/shape) device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:15.134697: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper random_uniform_1/RandomUniform op RandomUniform on GPU 0 stream[0]
2022-06-16 02:29:15.134713: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] random_uniform_1/RandomUniform:RandomUniform#shape=(int32[2])#
2022-06-16 02:29:15.134749: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled random_uniform_1/RandomUniform op RandomUniform on GPU 0 stream[0]
2022-06-16 02:29:15.134778: I tensorflow/core/common_runtime/executor.cc:783] Process node: 6 step 4 {{node MatMul}} = BatchMatMulV2[T=DT_FLOAT, _XlaHasReferenceVars=false, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](random_uniform/RandomUniform, _arg_Placeholder_0_0/_1) device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:15.134798: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper MatMul op BatchMatMulV2 on GPU 0 stream[0]
2022-06-16 02:29:15.134816: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] MatMul:BatchMatMulV2#shape=(float[1024,128];float[32,128,128])#
2022-06-16 02:29:15.134962: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled MatMul op BatchMatMulV2 on GPU 0 stream[0]
2022-06-16 02:29:15.135002: I tensorflow/core/common_runtime/executor.cc:783] Process node: 7 step 4 {{node Add}} = AddV2[T=DT_FLOAT, _XlaHasReferenceVars=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](MatMul, random_uniform_1/RandomUniform) device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:15.135022: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper Add op AddV2 on GPU 0 stream[0]
2022-06-16 02:29:15.135041: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] Add:AddV2#shape=(float[32,1024,128];float[1024,128])#
2022-06-16 02:29:15.135097: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled Add op AddV2 on GPU 0 stream[0]
2022-06-16 02:29:15.135142: I tensorflow/core/common_runtime/executor.cc:783] Process node: 8 step 4 {{node Add/_2}} = _Send[T=DT_FLOAT, _dst="_retval_Add_0_0", _src="Add", client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_12_Add", _device="/job:localhost/replica:0/task:0/device:GPU:0"](Add) device: /job:localhost/replica:0/task:0/device:GPU:0
2022-06-16 02:29:15.135164: I tensorflow/core/common_runtime/gpu/gpu_device.cc:664] GpuDevice::ComputeHelper Add/_2 op _Send on GPU 0 stream[0]
2022-06-16 02:29:15.135180: I tensorflow/core/common_runtime/gpu/gpu_device.cc:666] Add/_2:_Send#from=Add,to=_retval_Add_0_0#
2022-06-16 02:29:15.135200: I tensorflow/core/common_runtime/rendezvous_mgr.cc:167] IntraProcessRendezvous Send 0x1dbf8b0 /job:localhost/replica:0/task:0/device:GPU:0;0000000000000001;/job:localhost/replica:0/task:0/device:CPU:0;edge_12_Add;0:0
2022-06-16 02:29:15.135228: I tensorflow/core/common_runtime/copy_tensor.cc:211] Copy edge_12_Add
2022-06-16 02:29:15.135249: I tensorflow/core/common_runtime/gpu/gpu_util.cc:270] CopyGPUTensorToCPU
# run 4 D2H end
2022-06-16 02:29:15.135273: I tensorflow/stream_executor/stream.cc:1052] [stream=0x21bc9000,impl=0x21d0d1d0] Called Stream::ThenWaitFor(other=0x21d0ed00)
2022-06-16 02:29:15.135306: I tensorflow/stream_executor/stream.cc:3879] [stream=0x21bc9000,impl=0x21d0d1d0] Called Stream::ThenMemcpy(host_dst=0x7ff56e000000, gpu_src=0x7ff5a4300500, size=16777216)
2022-06-16 02:29:15.135348: I tensorflow/stream_executor/stream.cc:340] [stream=0x21bc9000,impl=0x21d0d1d0] Called Stream::ThenRecordEvent(event=0x7ff80c00a770)
2022-06-16 02:29:15.135386: I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] GpuDevice::ComputeHelper scheduled Add/_2 op _Send on GPU 0 stream[0]
2022-06-16 02:29:15.135416: I tensorflow/stream_executor/stream.cc:4366] [stream=0x21d0ed00,impl=0x21d0d8d0] Called Stream::BlockHostUntilDone()
2022-06-16 02:29:15.135434: I tensorflow/stream_executor/temporary_memory_manager.cc:64] deallocated 0 finalized temporaries
2022-06-16 02:29:15.136709: I tensorflow/core/common_runtime/executor.cc:783] Process node: 5 step 4 {{node _retval_Add_0_0}} = _Retval[T=DT_FLOAT, _XlaHasReferenceVars=false, index=0, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Add/_3) device: /job:localhost/replica:0/task:0/device:CPU:0
# run 4 compute end
run 1 costs 878.5572052001953
run 2 costs 3.3550262451171875
run 3 costs 25.066852569580078
run 4 costs 5.223751068115234