Skip to content

Instantly share code, notes, and snippets.

@briansp2020
Created October 15, 2023 00:12
Show Gist options
  • Save briansp2020/3e176c7a933cf23531642e326a2f91c5 to your computer and use it in GitHub Desktop.
Save briansp2020/3e176c7a933cf23531642e326a2f91c5 to your computer and use it in GitHub Desktop.
Latest ai-benchmark using ROCm 5.7.1 and tensorflow-upstream 10/14/2023 source.
(tf) root@rocm:~/tmp# python benchmark.py
2023-10-14 15:02:22.116047: E external/local_xla/xla/stream_executor/plugin_registry.cc:93] Invalid plugin kind specified: DNN
2023-10-14 15:02:22.348480: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-10-14 15:02:23.756833: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:02:23.982269: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:02:23.982301: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
>> AI-Benchmark - 0.1.3.cm
>> Let the AI Games begin
2023-10-14 15:02:25.095387: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:02:25.095474: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:02:25.095505: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:02:25.096220: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:02:25.096264: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:02:25.096317: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:02:25.096335: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1926] Created device /device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:03:00.0
2023-10-14 15:02:25.388550: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:02:25.388614: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:02:25.388631: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:02:25.388655: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:02:25.388671: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:02:25.388680: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1926] Created device /device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:03:00.0
2023-10-14 15:02:25.388709: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:02:25.388735: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:02:25.388749: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:02:25.388765: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:02:25.388779: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:02:25.388786: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1926] Created device /device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:03:00.0
* TF Version: 2.15.0
* Platform: Linux-5.15.0-86-generic-x86_64-with-glibc2.35
* CPU: AMD Ryzen 9 7900X 12-Core Processor
* CPU RAM: 63 GB
* GPU/0: Radeon RX 7900 XTX
* GPU RAM: 23.5 GB
* CUDA Version: N/A
* CUDA Build: N/A
The benchmark is running...
The tests might take up to 20 minutes
Please don't interrupt the script
1/19. MobileNet-V2
2023-10-14 15:02:26.643065: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:02:26.643180: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:02:26.643214: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:02:26.643259: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:02:26.643288: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:02:26.643304: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1926] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:03:00.0
2023-10-14 15:02:26.746956: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:388] MLIR V1 optimization pass is not enabled
2023-10-14 15:02:26.899813: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-10-14 15:02:27.270635: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 2622 ms
Inference Time: 29 ms
Inference Time: 31 ms
Inference Time: 24 ms
Inference Time: 21 ms
Inference Time: 32 ms
Inference Time: 22 ms
Inference Time: 22 ms
Inference Time: 22 ms
Inference Time: 22 ms
Inference Time: 18 ms
Inference Time: 22 ms
Inference Time: 22 ms
Inference Time: 21 ms
Inference Time: 22 ms
Inference Time: 22 ms
Inference Time: 18 ms
Inference Time: 22 ms
Inference Time: 22 ms
Inference Time: 21 ms
Inference Time: 22 ms
Inference Time: 22 ms
1.1 - inference | batch=50, size=224x224: 22.8 ± 3.5 ms
2023-10-14 15:02:33.291002: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-10-14 15:02:33.697929: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 784 ms
Training Time: 367 ms
Training Time: 363 ms
Training Time: 362 ms
Training Time: 375 ms
Training Time: 366 ms
Training Time: 375 ms
Training Time: 364 ms
Training Time: 359 ms
Training Time: 369 ms
Training Time: 375 ms
Training Time: 374 ms
Training Time: 353 ms
Training Time: 360 ms
Training Time: 358 ms
Training Time: 356 ms
Training Time: 358 ms
Training Time: 356 ms
Training Time: 359 ms
Training Time: 357 ms
Training Time: 353 ms
Training Time: 355 ms
1.2 - training | batch=50, size=224x224: 363 ± 7 ms
2/19. Inception-V3
2023-10-14 15:02:44.470971: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:02:44.471068: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:02:44.471101: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:02:44.471158: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:02:44.471191: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:02:44.471207: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1926] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:03:00.0
2023-10-14 15:02:44.741532: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-10-14 15:02:44.954822: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 2552 ms
Inference Time: 28 ms
Inference Time: 28 ms
Inference Time: 28 ms
Inference Time: 27 ms
Inference Time: 27 ms
Inference Time: 28 ms
Inference Time: 27 ms
Inference Time: 27 ms
Inference Time: 27 ms
Inference Time: 28 ms
Inference Time: 27 ms
Inference Time: 27 ms
Inference Time: 27 ms
Inference Time: 27 ms
Inference Time: 27 ms
Inference Time: 27 ms
Inference Time: 27 ms
Inference Time: 27 ms
Inference Time: 27 ms
Inference Time: 27 ms
Inference Time: 27 ms
2.1 - inference | batch=20, size=346x346: 27.2 ± 0.4 ms
2023-10-14 15:02:50.047277: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-10-14 15:02:50.614592: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 900 ms
Training Time: 399 ms
Training Time: 392 ms
Training Time: 393 ms
Training Time: 393 ms
Training Time: 392 ms
Training Time: 398 ms
Training Time: 398 ms
Training Time: 400 ms
Training Time: 395 ms
Training Time: 390 ms
Training Time: 390 ms
Training Time: 389 ms
Training Time: 395 ms
Training Time: 390 ms
Training Time: 389 ms
Training Time: 388 ms
Training Time: 389 ms
Training Time: 393 ms
Training Time: 389 ms
Training Time: 405 ms
Training Time: 389 ms
2.2 - training | batch=20, size=346x346: 393 ± 4 ms
3/19. Inception-V4
2023-10-14 15:03:00.614325: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:03:00.614392: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:03:00.614410: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:03:00.614439: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:03:00.614457: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:03:00.614467: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1926] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:03:00.0
2023-10-14 15:03:01.133304: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-10-14 15:03:01.416106: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 280 ms
Inference Time: 36 ms
Inference Time: 40 ms
Inference Time: 34 ms
Inference Time: 35 ms
Inference Time: 34 ms
Inference Time: 34 ms
Inference Time: 33 ms
Inference Time: 33 ms
Inference Time: 32 ms
Inference Time: 33 ms
Inference Time: 32 ms
Inference Time: 33 ms
Inference Time: 32 ms
Inference Time: 33 ms
Inference Time: 32 ms
Inference Time: 32 ms
Inference Time: 33 ms
Inference Time: 33 ms
Inference Time: 32 ms
Inference Time: 32 ms
Inference Time: 33 ms
3.1 - inference | batch=10, size=346x346: 33.4 ± 1.8 ms
2023-10-14 15:03:04.097457: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-10-14 15:03:04.989052: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 1134 ms
Training Time: 302 ms
Training Time: 301 ms
Training Time: 301 ms
Training Time: 298 ms
Training Time: 300 ms
Training Time: 300 ms
Training Time: 299 ms
Training Time: 300 ms
Training Time: 303 ms
Training Time: 300 ms
Training Time: 298 ms
Training Time: 300 ms
Training Time: 300 ms
Training Time: 300 ms
Training Time: 313 ms
Training Time: 303 ms
Training Time: 301 ms
Training Time: 303 ms
Training Time: 299 ms
Training Time: 302 ms
Training Time: 304 ms
3.2 - training | batch=10, size=346x346: 301 ± 3 ms
4/19. Inception-ResNet-V2
2023-10-14 15:03:12.299818: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:03:12.300077: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:03:12.300140: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:03:12.300169: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:03:12.300186: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:03:12.300195: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1926] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:03:00.0
2023-10-14 15:03:13.155066: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-10-14 15:03:13.591312: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 742 ms
Inference Time: 39 ms
Inference Time: 38 ms
Inference Time: 38 ms
Inference Time: 38 ms
Inference Time: 39 ms
Inference Time: 38 ms
Inference Time: 38 ms
Inference Time: 37 ms
Inference Time: 39 ms
Inference Time: 38 ms
Inference Time: 37 ms
Inference Time: 39 ms
Inference Time: 38 ms
Inference Time: 37 ms
Inference Time: 38 ms
Inference Time: 39 ms
Inference Time: 37 ms
Inference Time: 38 ms
Inference Time: 39 ms
Inference Time: 38 ms
Inference Time: 38 ms
4.1 - inference | batch=10, size=346x346: 38.1 ± 0.7 ms
2023-10-14 15:03:17.797480: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-10-14 15:03:19.411986: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 1822 ms
Training Time: 250 ms
Training Time: 267 ms
Training Time: 252 ms
Training Time: 249 ms
Training Time: 251 ms
Training Time: 249 ms
Training Time: 249 ms
Training Time: 251 ms
Training Time: 249 ms
Training Time: 252 ms
Training Time: 249 ms
Training Time: 252 ms
Training Time: 253 ms
Training Time: 252 ms
Training Time: 251 ms
Training Time: 251 ms
Training Time: 251 ms
Training Time: 251 ms
Training Time: 249 ms
Training Time: 251 ms
Training Time: 251 ms
4.2 - training | batch=8, size=346x346: 251 ± 4 ms
5/19. ResNet-V2-50
2023-10-14 15:03:25.775048: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:03:25.775199: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:03:25.775259: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:03:25.775324: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:03:25.775344: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:03:25.775356: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1926] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:03:00.0
2023-10-14 15:03:25.966544: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-10-14 15:03:26.098802: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 147 ms
Inference Time: 20 ms
Inference Time: 21 ms
Inference Time: 21 ms
Inference Time: 20 ms
Inference Time: 21 ms
Inference Time: 21 ms
Inference Time: 20 ms
Inference Time: 21 ms
Inference Time: 20 ms
Inference Time: 21 ms
Inference Time: 20 ms
Inference Time: 21 ms
Inference Time: 21 ms
Inference Time: 20 ms
Inference Time: 21 ms
Inference Time: 20 ms
Inference Time: 20 ms
Inference Time: 20 ms
Inference Time: 21 ms
Inference Time: 21 ms
Inference Time: 20 ms
5.1 - inference | batch=10, size=346x346: 20.5 ± 0.5 ms
2023-10-14 15:03:27.892222: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-10-14 15:03:28.267980: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 523 ms
Training Time: 85 ms
Training Time: 85 ms
Training Time: 84 ms
Training Time: 84 ms
Training Time: 84 ms
Training Time: 84 ms
Training Time: 85 ms
Training Time: 85 ms
Training Time: 84 ms
Training Time: 85 ms
Training Time: 85 ms
Training Time: 85 ms
Training Time: 84 ms
Training Time: 85 ms
Training Time: 84 ms
Training Time: 84 ms
Training Time: 84 ms
Training Time: 84 ms
Training Time: 88 ms
Training Time: 85 ms
Training Time: 84 ms
5.2 - training | batch=10, size=346x346: 84.6 ± 0.9 ms
6/19. ResNet-V2-152
2023-10-14 15:03:30.836981: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:03:30.837044: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:03:30.837061: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:03:30.837089: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:03:30.837107: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:03:30.837116: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1926] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:03:00.0
2023-10-14 15:03:31.654625: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-10-14 15:03:31.990248: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 340 ms
Inference Time: 31 ms
Inference Time: 30 ms
Inference Time: 31 ms
Inference Time: 31 ms
Inference Time: 30 ms
Inference Time: 30 ms
Inference Time: 30 ms
Inference Time: 30 ms
Inference Time: 30 ms
Inference Time: 30 ms
Inference Time: 30 ms
Inference Time: 30 ms
Inference Time: 30 ms
Inference Time: 31 ms
Inference Time: 31 ms
Inference Time: 30 ms
Inference Time: 30 ms
Inference Time: 30 ms
Inference Time: 30 ms
Inference Time: 31 ms
Inference Time: 31 ms
6.1 - inference | batch=10, size=256x256: 30.3 ± 0.5 ms
2023-10-14 15:03:35.281545: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-10-14 15:03:36.448473: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 1236 ms
Training Time: 108 ms
Training Time: 108 ms
Training Time: 108 ms
Training Time: 106 ms
Training Time: 106 ms
Training Time: 107 ms
Training Time: 107 ms
Training Time: 106 ms
Training Time: 106 ms
Training Time: 108 ms
Training Time: 107 ms
Training Time: 106 ms
Training Time: 106 ms
Training Time: 106 ms
Training Time: 107 ms
Training Time: 106 ms
Training Time: 106 ms
Training Time: 107 ms
Training Time: 107 ms
Training Time: 108 ms
Training Time: 106 ms
6.2 - training | batch=10, size=256x256: 106.8 ± 0.8 ms
7/19. VGG-16
2023-10-14 15:03:39.442594: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:03:39.443575: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:03:39.443892: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:03:39.443928: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:03:39.443946: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:03:39.443956: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1926] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:03:00.0
2023-10-14 15:03:39.480703: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-10-14 15:03:39.556154: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 86 ms
Inference Time: 42 ms
Inference Time: 44 ms
Inference Time: 45 ms
Inference Time: 47 ms
Inference Time: 45 ms
Inference Time: 45 ms
Inference Time: 46 ms
Inference Time: 44 ms
Inference Time: 44 ms
Inference Time: 47 ms
Inference Time: 46 ms
Inference Time: 45 ms
Inference Time: 45 ms
Inference Time: 43 ms
Inference Time: 42 ms
Inference Time: 44 ms
Inference Time: 47 ms
Inference Time: 43 ms
Inference Time: 45 ms
Inference Time: 45 ms
Inference Time: 51 ms
7.1 - inference | batch=20, size=224x224: 45.0 ± 2.0 ms
2023-10-14 15:03:41.816492: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-10-14 15:03:42.116570: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 184 ms
Training Time: 84 ms
Training Time: 80 ms
Training Time: 81 ms
Training Time: 79 ms
Training Time: 82 ms
Training Time: 80 ms
Training Time: 80 ms
Training Time: 82 ms
Training Time: 83 ms
Training Time: 79 ms
Training Time: 76 ms
Training Time: 79 ms
Training Time: 79 ms
Training Time: 80 ms
Training Time: 79 ms
Training Time: 81 ms
Training Time: 79 ms
Training Time: 82 ms
Training Time: 79 ms
Training Time: 80 ms
Training Time: 83 ms
7.2 - training | batch=2, size=224x224: 80.3 ± 1.8 ms
8/19. SRCNN 9-5-5
2023-10-14 15:03:44.102987: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:03:44.103080: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:03:44.103113: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:03:44.103163: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:03:44.103194: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:03:44.103210: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1926] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:03:00.0
2023-10-14 15:03:44.118538: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-10-14 15:03:44.230837: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 50 ms
Inference Time: 30 ms
Inference Time: 32 ms
Inference Time: 29 ms
Inference Time: 34 ms
Inference Time: 28 ms
Inference Time: 34 ms
Inference Time: 27 ms
Inference Time: 34 ms
Inference Time: 28 ms
Inference Time: 32 ms
Inference Time: 28 ms
Inference Time: 32 ms
Inference Time: 28 ms
Inference Time: 34 ms
Inference Time: 27 ms
Inference Time: 36 ms
Inference Time: 27 ms
Inference Time: 36 ms
Inference Time: 27 ms
Inference Time: 36 ms
Inference Time: 27 ms
8.1 - inference | batch=10, size=512x512: 30.8 ± 3.3 ms
Inference Time: 24 ms
Inference Time: 23 ms
Inference Time: 23 ms
Inference Time: 23 ms
Inference Time: 23 ms
Inference Time: 25 ms
Inference Time: 23 ms
Inference Time: 22 ms
Inference Time: 23 ms
Inference Time: 22 ms
Inference Time: 23 ms
Inference Time: 23 ms
Inference Time: 25 ms
Inference Time: 23 ms
Inference Time: 23 ms
Inference Time: 23 ms
Inference Time: 23 ms
Inference Time: 23 ms
Inference Time: 23 ms
Inference Time: 23 ms
Inference Time: 22 ms
Inference Time: 23 ms
8.2 - inference | batch=1, size=1536x1536: 23.0 ± 0.7 ms
2023-10-14 15:03:47.984362: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-10-14 15:03:48.309565: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 197 ms
Training Time: 181 ms
Training Time: 176 ms
Training Time: 175 ms
Training Time: 170 ms
Training Time: 171 ms
Training Time: 170 ms
Training Time: 171 ms
Training Time: 166 ms
Training Time: 171 ms
Training Time: 169 ms
Training Time: 174 ms
Training Time: 172 ms
Training Time: 172 ms
Training Time: 170 ms
Training Time: 170 ms
Training Time: 171 ms
Training Time: 170 ms
Training Time: 168 ms
Training Time: 174 ms
Training Time: 168 ms
Training Time: 169 ms
8.3 - training | batch=10, size=512x512: 171 ± 3 ms
9/19. VGG-19 Super-Res
2023-10-14 15:03:58.190969: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:03:58.191029: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:03:58.191046: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:03:58.191070: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:03:58.191086: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:03:58.191095: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1926] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:03:00.0
2023-10-14 15:03:58.242288: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-10-14 15:03:58.344816: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 72 ms
Inference Time: 36 ms
Inference Time: 35 ms
Inference Time: 36 ms
Inference Time: 36 ms
Inference Time: 35 ms
Inference Time: 36 ms
Inference Time: 36 ms
Inference Time: 35 ms
Inference Time: 38 ms
Inference Time: 35 ms
Inference Time: 36 ms
Inference Time: 36 ms
Inference Time: 36 ms
Inference Time: 36 ms
Inference Time: 35 ms
Inference Time: 36 ms
Inference Time: 35 ms
Inference Time: 36 ms
Inference Time: 36 ms
Inference Time: 36 ms
Inference Time: 36 ms
9.1 - inference | batch=10, size=256x256: 35.8 ± 0.7 ms
Inference Time: 59 ms
Inference Time: 56 ms
Inference Time: 59 ms
Inference Time: 58 ms
Inference Time: 56 ms
Inference Time: 58 ms
Inference Time: 66 ms
Inference Time: 55 ms
Inference Time: 56 ms
Inference Time: 58 ms
Inference Time: 56 ms
Inference Time: 57 ms
Inference Time: 60 ms
Inference Time: 66 ms
Inference Time: 60 ms
Inference Time: 59 ms
Inference Time: 60 ms
Inference Time: 60 ms
Inference Time: 63 ms
Inference Time: 59 ms
Inference Time: 59 ms
Inference Time: 60 ms
9.2 - inference | batch=1, size=1024x1024: 59.1 ± 2.9 ms
2023-10-14 15:04:02.714165: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-10-14 15:04:03.008050: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 276 ms
Training Time: 200 ms
Training Time: 200 ms
Training Time: 199 ms
Training Time: 199 ms
Training Time: 199 ms
Training Time: 199 ms
Training Time: 201 ms
Training Time: 199 ms
Training Time: 200 ms
Training Time: 202 ms
Training Time: 200 ms
Training Time: 200 ms
Training Time: 199 ms
Training Time: 200 ms
Training Time: 200 ms
Training Time: 201 ms
Training Time: 200 ms
Training Time: 200 ms
Training Time: 200 ms
Training Time: 201 ms
Training Time: 200 ms
9.3 - training | batch=10, size=224x224: 200.0 ± 0.8 ms
10/19. ResNet-SRGAN
2023-10-14 15:04:12.157956: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:04:12.158018: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:04:12.158037: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:04:12.158067: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:04:12.158087: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:04:12.158097: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1926] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:03:00.0
2023-10-14 15:04:12.385685: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-10-14 15:04:12.583018: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 2656 ms
Inference Time: 45 ms
Inference Time: 43 ms
Inference Time: 43 ms
Inference Time: 43 ms
Inference Time: 42 ms
Inference Time: 42 ms
Inference Time: 44 ms
Inference Time: 43 ms
Inference Time: 43 ms
Inference Time: 42 ms
Inference Time: 43 ms
Inference Time: 44 ms
Inference Time: 43 ms
Inference Time: 43 ms
Inference Time: 43 ms
Inference Time: 42 ms
Inference Time: 42 ms
Inference Time: 43 ms
Inference Time: 42 ms
Inference Time: 43 ms
Inference Time: 42 ms
10.1 - inference | batch=10, size=512x512: 42.9 ± 0.8 ms
Inference Time: 40 ms
Inference Time: 35 ms
Inference Time: 37 ms
Inference Time: 36 ms
Inference Time: 36 ms
Inference Time: 35 ms
Inference Time: 35 ms
Inference Time: 36 ms
Inference Time: 36 ms
Inference Time: 35 ms
Inference Time: 34 ms
Inference Time: 34 ms
Inference Time: 38 ms
Inference Time: 34 ms
Inference Time: 36 ms
Inference Time: 36 ms
Inference Time: 37 ms
Inference Time: 37 ms
Inference Time: 35 ms
Inference Time: 35 ms
Inference Time: 35 ms
Inference Time: 36 ms
10.2 - inference | batch=1, size=1536x1536: 35.6 ± 1.0 ms
2023-10-14 15:04:19.776953: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-10-14 15:04:20.131030: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 330 ms
Training Time: 121 ms
Training Time: 114 ms
Training Time: 117 ms
Training Time: 120 ms
Training Time: 118 ms
Training Time: 131 ms
Training Time: 116 ms
Training Time: 117 ms
Training Time: 116 ms
Training Time: 116 ms
Training Time: 119 ms
Training Time: 115 ms
Training Time: 114 ms
Training Time: 115 ms
Training Time: 118 ms
Training Time: 114 ms
Training Time: 114 ms
Training Time: 114 ms
Training Time: 115 ms
Training Time: 115 ms
Training Time: 115 ms
10.3 - training | batch=5, size=512x512: 117 ± 4 ms
11/19. ResNet-DPED
2023-10-14 15:04:25.763317: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:04:25.763481: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:04:25.763506: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:04:25.763535: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:04:25.763555: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:04:25.763565: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1926] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:03:00.0
2023-10-14 15:04:25.793871: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-10-14 15:04:25.905343: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 636 ms
Inference Time: 47 ms
Inference Time: 48 ms
Inference Time: 47 ms
Inference Time: 48 ms
Inference Time: 48 ms
Inference Time: 48 ms
Inference Time: 48 ms
Inference Time: 48 ms
Inference Time: 49 ms
Inference Time: 48 ms
Inference Time: 47 ms
Inference Time: 49 ms
Inference Time: 48 ms
Inference Time: 52 ms
Inference Time: 48 ms
Inference Time: 48 ms
Inference Time: 47 ms
Inference Time: 49 ms
Inference Time: 48 ms
Inference Time: 48 ms
Inference Time: 49 ms
11.1 - inference | batch=10, size=256x256: 48.2 ± 1.1 ms
Inference Time: 4254 ms
Inference Time: 78 ms
Inference Time: 77 ms
Inference Time: 80 ms
Inference Time: 77 ms
Inference Time: 80 ms
Inference Time: 79 ms
Inference Time: 78 ms
Inference Time: 78 ms
Inference Time: 79 ms
Inference Time: 78 ms
Inference Time: 79 ms
Inference Time: 79 ms
Inference Time: 80 ms
Inference Time: 78 ms
Inference Time: 80 ms
Inference Time: 80 ms
Inference Time: 79 ms
Inference Time: 78 ms
Inference Time: 78 ms
Inference Time: 79 ms
Inference Time: 79 ms
11.2 - inference | batch=1, size=1024x1024: 78.7 ± 0.9 ms
2023-10-14 15:04:35.872063: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-10-14 15:04:36.375474: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 1906 ms
Training Time: 106 ms
Training Time: 108 ms
Training Time: 108 ms
Training Time: 106 ms
Training Time: 107 ms
Training Time: 107 ms
Training Time: 107 ms
Training Time: 106 ms
Training Time: 106 ms
Training Time: 108 ms
Training Time: 107 ms
Training Time: 107 ms
Training Time: 107 ms
Training Time: 106 ms
Training Time: 107 ms
Training Time: 106 ms
Training Time: 106 ms
Training Time: 106 ms
Training Time: 107 ms
Training Time: 107 ms
Training Time: 106 ms
11.3 - training | batch=15, size=128x128: 106.7 ± 0.7 ms
12/19. U-Net
2023-10-14 15:04:47.115840: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:04:47.115901: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:04:47.115918: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:04:47.115942: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:04:47.115959: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:04:47.115967: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1926] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:03:00.0
2023-10-14 15:04:47.187508: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-10-14 15:04:47.284195: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 6592 ms
Inference Time: 81 ms
Inference Time: 87 ms
Inference Time: 79 ms
Inference Time: 78 ms
Inference Time: 79 ms
Inference Time: 79 ms
Inference Time: 78 ms
Inference Time: 79 ms
Inference Time: 78 ms
Inference Time: 79 ms
Inference Time: 78 ms
Inference Time: 78 ms
Inference Time: 79 ms
Inference Time: 79 ms
Inference Time: 78 ms
Inference Time: 80 ms
Inference Time: 78 ms
Inference Time: 83 ms
Inference Time: 79 ms
Inference Time: 78 ms
Inference Time: 80 ms
12.1 - inference | batch=4, size=512x512: 79.4 ± 2.1 ms
Inference Time: 7581 ms
Inference Time: 82 ms
Inference Time: 82 ms
Inference Time: 82 ms
Inference Time: 81 ms
Inference Time: 81 ms
Inference Time: 80 ms
Inference Time: 80 ms
Inference Time: 81 ms
Inference Time: 81 ms
Inference Time: 81 ms
Inference Time: 80 ms
Inference Time: 80 ms
Inference Time: 81 ms
Inference Time: 81 ms
Inference Time: 82 ms
Inference Time: 82 ms
Inference Time: 80 ms
Inference Time: 81 ms
Inference Time: 81 ms
Inference Time: 82 ms
Inference Time: 81 ms
12.2 - inference | batch=1, size=1024x1024: 81.0 ± 0.7 ms
2023-10-14 15:05:06.333452: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-10-14 15:05:06.760069: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 5457 ms
Training Time: 112 ms
Training Time: 112 ms
Training Time: 113 ms
Training Time: 113 ms
Training Time: 113 ms
Training Time: 113 ms
Training Time: 112 ms
Training Time: 112 ms
Training Time: 113 ms
Training Time: 113 ms
Training Time: 111 ms
Training Time: 112 ms
Training Time: 111 ms
Training Time: 112 ms
Training Time: 113 ms
Training Time: 113 ms
Training Time: 113 ms
Training Time: 113 ms
Training Time: 113 ms
Training Time: 112 ms
Training Time: 112 ms
12.3 - training | batch=4, size=256x256: 112.4 ± 0.7 ms
13/19. Nvidia-SPADE
2023-10-14 15:05:15.045140: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:05:15.045205: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:05:15.045222: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:05:15.045249: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:05:15.045266: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:05:15.045274: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1926] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:03:00.0
2023-10-14 15:05:15.216647: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-10-14 15:05:15.402677: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 1868 ms
Inference Time: 52 ms
Inference Time: 51 ms
Inference Time: 50 ms
Inference Time: 51 ms
Inference Time: 51 ms
Inference Time: 51 ms
Inference Time: 51 ms
Inference Time: 50 ms
Inference Time: 51 ms
Inference Time: 51 ms
Inference Time: 50 ms
Inference Time: 51 ms
Inference Time: 51 ms
Inference Time: 51 ms
Inference Time: 50 ms
Inference Time: 50 ms
Inference Time: 50 ms
Inference Time: 51 ms
Inference Time: 50 ms
Inference Time: 51 ms
Inference Time: 52 ms
13.1 - inference | batch=5, size=128x128: 50.8 ± 0.6 ms
2023-10-14 15:05:19.865705: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-10-14 15:05:20.769577: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 2589 ms
Training Time: 80 ms
Training Time: 80 ms
Training Time: 79 ms
Training Time: 81 ms
Training Time: 81 ms
Training Time: 83 ms
Training Time: 81 ms
Training Time: 80 ms
Training Time: 79 ms
Training Time: 82 ms
Training Time: 81 ms
Training Time: 80 ms
Training Time: 80 ms
Training Time: 80 ms
Training Time: 80 ms
Training Time: 81 ms
Training Time: 81 ms
Training Time: 82 ms
Training Time: 79 ms
Training Time: 82 ms
Training Time: 82 ms
13.2 - training | batch=1, size=128x128: 80.7 ± 1.1 ms
14/19. ICNet
2023-10-14 15:05:24.692589: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:05:24.692992: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:05:24.693033: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:05:24.693084: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:05:24.693119: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:05:24.693137: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1926] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:03:00.0
2023-10-14 15:05:24.855863: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-10-14 15:05:25.071520: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 1616 ms
Inference Time: 80 ms
Inference Time: 82 ms
Inference Time: 82 ms
Inference Time: 85 ms
Inference Time: 90 ms
Inference Time: 88 ms
Inference Time: 83 ms
Inference Time: 89 ms
Inference Time: 84 ms
Inference Time: 91 ms
Inference Time: 84 ms
Inference Time: 85 ms
Inference Time: 82 ms
Inference Time: 87 ms
Inference Time: 84 ms
Inference Time: 87 ms
Inference Time: 87 ms
Inference Time: 84 ms
Inference Time: 83 ms
Inference Time: 86 ms
Inference Time: 87 ms
14.1 - inference | batch=5, size=1024x1536: 85.2 ± 2.8 ms
2023-10-14 15:05:30.605581: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-10-14 15:05:31.308104: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 3481 ms
Training Time: 377 ms
Training Time: 395 ms
Training Time: 383 ms
Training Time: 378 ms
Training Time: 375 ms
Training Time: 399 ms
Training Time: 438 ms
Training Time: 379 ms
Training Time: 395 ms
Training Time: 435 ms
Training Time: 412 ms
Training Time: 407 ms
Training Time: 377 ms
Training Time: 364 ms
Training Time: 388 ms
Training Time: 426 ms
Training Time: 380 ms
Training Time: 376 ms
Training Time: 433 ms
Training Time: 368 ms
Training Time: 404 ms
14.2 - training | batch=10, size=1024x1536: 395 ± 22 ms
15/19. PSPNet
2023-10-14 15:05:51.716627: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:05:51.716713: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:05:51.716731: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:05:51.716760: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:05:51.716779: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:05:51.716789: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1926] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:03:00.0
2023-10-14 15:05:51.944695: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-10-14 15:05:52.158135: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 10123 ms
Inference Time: 206 ms
Inference Time: 211 ms
Inference Time: 205 ms
Inference Time: 208 ms
Inference Time: 210 ms
Inference Time: 209 ms
Inference Time: 213 ms
Inference Time: 215 ms
Inference Time: 210 ms
Inference Time: 206 ms
Inference Time: 211 ms
Inference Time: 206 ms
Inference Time: 210 ms
Inference Time: 211 ms
Inference Time: 213 ms
Inference Time: 206 ms
Inference Time: 205 ms
Inference Time: 205 ms
Inference Time: 220 ms
Inference Time: 214 ms
Inference Time: 211 ms
15.1 - inference | batch=5, size=720x720: 210 ± 4 ms
2023-10-14 15:06:08.352412: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-10-14 15:06:08.815435: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 5221 ms
Training Time: 124 ms
Training Time: 123 ms
Training Time: 124 ms
Training Time: 124 ms
Training Time: 124 ms
Training Time: 127 ms
Training Time: 124 ms
Training Time: 125 ms
Training Time: 124 ms
Training Time: 124 ms
Training Time: 124 ms
Training Time: 124 ms
Training Time: 125 ms
Training Time: 125 ms
Training Time: 125 ms
Training Time: 124 ms
Training Time: 124 ms
Training Time: 124 ms
Training Time: 124 ms
Training Time: 129 ms
Training Time: 124 ms
15.2 - training | batch=1, size=512x512: 125 ± 1 ms
16/19. DeepLab
2023-10-14 15:06:16.646700: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:06:16.646798: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:06:16.646831: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:06:16.646877: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:06:16.646909: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:06:16.646928: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1926] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:03:00.0
2023-10-14 15:06:17.089644: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-10-14 15:06:17.458490: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 1179 ms
Inference Time: 48 ms
Inference Time: 48 ms
Inference Time: 47 ms
Inference Time: 49 ms
Inference Time: 48 ms
Inference Time: 47 ms
Inference Time: 49 ms
Inference Time: 48 ms
Inference Time: 48 ms
Inference Time: 48 ms
Inference Time: 47 ms
Inference Time: 47 ms
Inference Time: 48 ms
Inference Time: 47 ms
Inference Time: 47 ms
Inference Time: 49 ms
Inference Time: 47 ms
Inference Time: 48 ms
Inference Time: 48 ms
Inference Time: 49 ms
Inference Time: 48 ms
16.1 - inference | batch=2, size=512x512: 47.9 ± 0.7 ms
2023-10-14 15:06:21.077653: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-10-14 15:06:21.913419: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 2215 ms
Training Time: 77 ms
Training Time: 76 ms
Training Time: 77 ms
Training Time: 76 ms
Training Time: 76 ms
Training Time: 76 ms
Training Time: 76 ms
Training Time: 76 ms
Training Time: 76 ms
Training Time: 76 ms
Training Time: 76 ms
Training Time: 76 ms
Training Time: 76 ms
Training Time: 77 ms
Training Time: 88 ms
Training Time: 77 ms
Training Time: 77 ms
Training Time: 77 ms
Training Time: 77 ms
Training Time: 77 ms
Training Time: 76 ms
16.2 - training | batch=1, size=384x384: 77.0 ± 2.5 ms
17/19. Pixel-RNN
2023-10-14 15:06:25.215820: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:06:25.215898: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:06:25.215929: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:06:25.216048: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:06:25.216074: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:06:25.216085: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1926] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:03:00.0
2023-10-14 15:06:25.341668: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:06:25.341719: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:06:25.341738: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:06:25.341762: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:06:25.341779: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:06:25.341788: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1926] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:03:00.0
2023-10-14 15:06:32.815332: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-10-14 15:06:32.827924: W tensorflow/c/c_api.cc:305] Operation '{name:'conv2d_out_logits/biases/Adam_1/Assign' id:47115 op device:{requested: '', assigned: ''} def:{{{node conv2d_out_logits/biases/Adam_1/Assign}} = AssignVariableOp[_has_manual_control_dependencies=true, dtype=DT_FLOAT, validate_shape=false](conv2d_out_logits/biases/Adam_1, conv2d_out_logits/biases/Adam_1/Initializer/zeros)}}' was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session.
2023-10-14 15:06:33.577408: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-10-14 15:06:35.532577: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 2369 ms
Inference Time: 307 ms
Inference Time: 315 ms
Inference Time: 316 ms
Inference Time: 312 ms
Inference Time: 313 ms
Inference Time: 314 ms
Inference Time: 308 ms
Inference Time: 313 ms
Inference Time: 307 ms
Inference Time: 310 ms
Inference Time: 314 ms
Inference Time: 307 ms
Inference Time: 323 ms
Inference Time: 307 ms
Inference Time: 293 ms
Inference Time: 300 ms
Inference Time: 311 ms
Inference Time: 303 ms
Inference Time: 305 ms
Inference Time: 308 ms
Inference Time: 309 ms
17.1 - inference | batch=50, size=64x64: 309 ± 6 ms
2023-10-14 15:06:58.646138: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 12407 ms
Training Time: 1462 ms
Training Time: 1558 ms
Training Time: 1524 ms
Training Time: 1587 ms
Training Time: 1639 ms
Training Time: 1530 ms
Training Time: 1557 ms
Training Time: 1536 ms
Training Time: 1559 ms
Training Time: 1551 ms
17.2 - training | batch=10, size=64x64: 1550 ± 43 ms
18/19. LSTM-Sentiment
2023-10-14 15:07:20.838263: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:07:20.838338: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:07:20.838356: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:07:20.838391: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:07:20.838409: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:07:20.838420: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1926] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:03:00.0
2023-10-14 15:07:20.944745: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-10-14 15:07:20.948530: W tensorflow/c/c_api.cc:305] Operation '{name:'Variable_1/Adam_1/Assign' id:325 op device:{requested: '', assigned: ''} def:{{{node Variable_1/Adam_1/Assign}} = AssignVariableOp[_has_manual_control_dependencies=true, dtype=DT_FLOAT, validate_shape=false](Variable_1/Adam_1, Variable_1/Adam_1/Initializer/zeros)}}' was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session.
2023-10-14 15:07:20.972233: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-10-14 15:07:21.198197: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 548 ms
Inference Time: 352 ms
Inference Time: 386 ms
Inference Time: 366 ms
Inference Time: 359 ms
Inference Time: 366 ms
Inference Time: 365 ms
Inference Time: 360 ms
Inference Time: 378 ms
Inference Time: 364 ms
Inference Time: 359 ms
Inference Time: 362 ms
Inference Time: 365 ms
Inference Time: 382 ms
Inference Time: 364 ms
Inference Time: 381 ms
Inference Time: 379 ms
Inference Time: 361 ms
Inference Time: 363 ms
Inference Time: 362 ms
Inference Time: 385 ms
Inference Time: 384 ms
18.1 - inference | batch=100, size=1024x300: 369 ± 10 ms
2023-10-14 15:07:31.951713: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 810 ms
Training Time: 707 ms
Training Time: 707 ms
Training Time: 726 ms
Training Time: 716 ms
Training Time: 712 ms
Training Time: 708 ms
Training Time: 705 ms
Training Time: 701 ms
Training Time: 694 ms
Training Time: 692 ms
Training Time: 708 ms
Training Time: 699 ms
Training Time: 705 ms
Training Time: 706 ms
Training Time: 716 ms
Training Time: 725 ms
Training Time: 725 ms
Training Time: 721 ms
Training Time: 718 ms
Training Time: 735 ms
Training Time: 726 ms
18.2 - training | batch=10, size=1024x300: 712 ± 11 ms
19/19. GNMT-Translation
2023-10-14 15:07:48.028090: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:07:48.028179: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:07:48.028210: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:07:48.028250: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:07:48.028279: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:07:48.028295: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1926] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:03:00.0
2023-10-14 15:07:48.685811: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-10-14 15:07:48.696186: W tensorflow/c/c_api.cc:305] Operation '{name:'index_to_string/table_init' id:13 op device:{requested: '', assigned: ''} def:{{{node index_to_string/table_init}} = InitializeTableFromTextFileV2[_has_manual_control_dependencies=true, delimiter="\t", key_index=-1, offset=0, value_index=-2, vocab_size=-1](index_to_string, index_to_string/table_init/asset_filepath)}}' was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session.
2023-10-14 15:07:48.714160: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-10-14 15:07:48.835731: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 707 ms
Inference Time: 102 ms
Inference Time: 101 ms
Inference Time: 101 ms
Inference Time: 101 ms
Inference Time: 100 ms
Inference Time: 101 ms
Inference Time: 101 ms
Inference Time: 100 ms
Inference Time: 101 ms
Inference Time: 101 ms
Inference Time: 100 ms
Inference Time: 100 ms
Inference Time: 102 ms
Inference Time: 101 ms
Inference Time: 100 ms
Inference Time: 101 ms
Inference Time: 100 ms
Inference Time: 101 ms
Inference Time: 101 ms
Inference Time: 100 ms
Inference Time: 100 ms
19.1 - inference | batch=1, size=1x20: 100.7 ± 0.6 ms
Device Inference Score: 25523
Device Training Score: 15473
Device AI Score: 40996
For more information and results, please visit http://ai-benchmark.com/alpha
@oscarbg
Copy link

oscarbg commented Nov 3, 2023

Hi,
thanks for sharing!
two questions:
1)where you obtain version"0.1.3.cm" I see: AI-Benchmark - 0.1.3.cm
I obtain 0.1.2 https://ai-benchmark.com/alpha from https://pypi.org/project/ai-benchmark/#history
2) by "tensorflow-upstream" are you building from source from the amd rocm tensorflow repo as on pipy I only see:
https://pypi.org/project/tensorflow-rocm/#history
pip install tensorflow-rocm==2.13.0.570
and you seem using 2.15
3) interested on posting also (Windows) tensorflow-directml benchmarks? up to tensorflow 2.11 his plugin..

@briansp2020
Copy link
Author

pip install new-ai-benchmark

https://pypi.org/project/new-ai-benchmark/

I built from https://github.com/ROCmSoftwarePlatform/tensorflow-upstream

I haven't played with windows tensorflow-directml

@oscarbg
Copy link

oscarbg commented Nov 3, 2023

thanks for all the info! ..
https://pypi.org/project/new-ai-benchmark/ interesting..
curious by versioning it's 2.2.0 vs 0.1.3.cm ..
I will wait for a 2.15 rocm build from pipy..

@johnnynunez
Copy link

thanks for all the info! .. https://pypi.org/project/new-ai-benchmark/ interesting.. curious by versioning it's 2.2.0 vs 0.1.3.cm .. I will wait for a 2.15 rocm build from pipy..

it was builded from me, the ai benchmark is not compatible with python +3.10.
You can see the code here: https://github.com/johnnynunez/ai-benchmark

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment