-
-
Save wyphan/ff6f1875ec7b6388989737d89c10549a to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
% MAGMA 2.5.3 compiled for CUDA capability >= 7.0, 32-bit magma_int_t, 64-bit pointer. | |
% CUDA runtime 10010, driver 10010. OpenMP threads 28. | |
% device 0: Tesla V100-SXM2-16GB, 1530.0 MHz clock, 16128.0 MiB memory, capability 7.0 | |
% Fri Jul 17 00:52:55 2020 | |
% Usage: ./testing/testing_zgemm_batched [options] [-h|--help] | |
% If running lapack (option --lapack), MAGMA and CUBLAS error are both computed | |
% relative to CPU BLAS result. Else, MAGMA error is computed relative to CUBLAS result. | |
% transA = No transpose, transB = No transpose | |
% version = 1, regular batch GEMM | |
% BatchCount M N K MAGMA Gflop/s (ms) CUBLAS Gflop/s (ms) CPU Gflop/s (ms) MAGMA error CUBLAS error | |
%======================================================================================================================== | |
300 32 32 32 981.71 ( 0.08) 826.70 ( 0.10) 34.74 ( 2.26) 4.92e-17 4.91e-17 ok | |
BLAS : Bad memory unallocation! : 1024 0x2000c0000000 | |
BLAS : Bad memory unallocation! : 1024 0x2000ec000000 | |
300 32 32 32 1121.95 ( 0.07) 958.88 ( 0.08) 42.53 ( 1.85) 4.92e-17 4.91e-17 ok | |
BLAS : Bad memory unallocation! : 1024 0x2000cc000000 | |
300 32 32 32 1074.44 ( 0.07) 913.72 ( 0.09) 39.09 ( 2.01) 4.92e-17 4.91e-17 ok | |
BLAS : Bad memory unallocation! : 1024 0x2000ec000000 | |
300 32 32 32 1077.95 ( 0.07) 777.96 ( 0.10) 34.53 ( 2.28) 4.92e-17 4.91e-17 ok | |
BLAS : Bad memory unallocation! : 1024 0x2000b8000000 | |
300 32 32 32 1092.23 ( 0.07) 958.88 ( 0.08) 37.08 ( 2.12) 4.92e-17 4.91e-17 ok | |
BLAS : Bad memory unallocation! : 1024 0x2000d8000000 | |
300 32 32 32 1064.04 ( 0.07) 903.71 ( 0.09) 38.97 ( 2.02) 2.58e-02 2.58e-02 failed | |
300 64 64 64 3532.57 ( 0.18) 3453.96 ( 0.18) 91.27 ( 6.89) 1.28e-02 1.28e-02 failed | |
300 64 64 64 3614.83 ( 0.17) 3654.89 ( 0.17) 92.09 ( 6.83) 1.26e-02 1.26e-02 failed | |
300 64 64 64 3575.65 ( 0.18) 3383.11 ( 0.19) 91.96 ( 6.84) 1.34e-02 1.34e-02 failed | |
300 64 64 64 3556.37 ( 0.18) 3418.17 ( 0.18) 92.12 ( 6.83) 1.30e-02 1.30e-02 failed | |
300 64 64 64 3614.83 ( 0.17) 3327.65 ( 0.19) 92.18 ( 6.82) 1.26e-02 1.26e-02 failed | |
300 64 64 64 3570.81 ( 0.18) 3327.65 ( 0.19) 92.36 ( 6.81) 1.32e-02 1.32e-02 failed | |
300 96 96 96 4498.00 ( 0.47) 4772.80 ( 0.44) 99.60 ( 21.32) 8.80e-03 8.80e-03 failed | |
300 96 96 96 4527.73 ( 0.47) 4904.21 ( 0.43) 99.08 ( 21.43) 8.65e-03 8.65e-03 failed | |
300 96 96 96 4498.00 ( 0.47) 4687.39 ( 0.45) 98.08 ( 21.65) 8.75e-03 8.75e-03 failed | |
300 96 96 96 4498.00 ( 0.47) 4655.54 ( 0.46) 101.29 ( 20.96) 8.54e-03 8.54e-03 failed | |
300 96 96 96 4507.11 ( 0.47) 4687.39 ( 0.45) 100.78 ( 21.07) 8.34e-03 8.34e-03 failed | |
300 96 96 96 4500.27 ( 0.47) 4687.39 ( 0.45) 101.13 ( 21.00) 8.51e-03 8.51e-03 failed | |
300 128 128 128 5013.21 ( 1.00) 5459.17 ( 0.92) 103.99 ( 48.40) 6.45e-03 6.45e-03 failed | |
300 128 128 128 5028.73 ( 1.00) 5542.30 ( 0.91) 99.55 ( 50.56) 6.43e-03 6.43e-03 failed | |
300 128 128 128 5022.75 ( 1.00) 5605.58 ( 0.90) 98.75 ( 50.97) 6.25e-03 6.25e-03 failed | |
300 128 128 128 5003.70 ( 1.01) 5193.27 ( 0.97) 104.20 ( 48.30) 6.38e-03 6.38e-03 failed | |
300 128 128 128 5003.70 ( 1.01) 5464.83 ( 0.92) 98.63 ( 51.03) 6.22e-03 6.22e-03 failed | |
300 128 128 128 4993.05 ( 1.01) 5476.17 ( 0.92) 102.89 ( 48.92) 6.46e-03 6.46e-03 failed | |
300 160 160 160 5135.98 ( 1.91) 5779.60 ( 1.70) 103.79 ( 94.71) 4.86e-03 4.86e-03 failed | |
300 160 160 160 5143.67 ( 1.91) 5810.55 ( 1.69) 102.89 ( 95.54) 4.98e-03 4.98e-03 failed | |
300 160 160 160 5141.11 ( 1.91) 5755.40 ( 1.71) 102.66 ( 95.76) 5.08e-03 5.08e-03 failed | |
300 160 160 160 5144.32 ( 1.91) 5775.55 ( 1.70) 102.25 ( 96.14) 5.19e-03 5.19e-03 failed | |
BLAS : Bad memory unallocation! : 1024 0x200148a40000 | |
BLAS : Bad memory unallocation! : 1024 0x200160a40000 | |
300 160 160 160 5138.55 ( 1.91) 5772.32 ( 1.70) 102.69 ( 95.73) 4.01e-17 4.03e-17 ok | |
BLAS : Bad memory unallocation! : 1024 0x20012ca40000 | |
BLAS : Bad memory unallocation! : 1024 0x200130a40000 | |
300 160 160 160 5144.32 ( 1.91) 5803.19 ( 1.69) 102.79 ( 95.63) 5.00e-03 5.00e-03 failed | |
300 192 192 192 5313.47 ( 3.20) 6047.22 ( 2.81) 107.40 ( 158.16) 4.22e-03 4.22e-03 failed | |
300 192 192 192 5316.64 ( 3.20) 6030.33 ( 2.82) 107.29 ( 158.32) 3.77e-17 3.79e-17 ok | |
300 192 192 192 5311.49 ( 3.20) 6017.09 ( 2.82) 105.75 ( 160.64) 4.26e-03 4.26e-03 failed | |
300 192 192 192 5311.89 ( 3.20) 6030.33 ( 2.82) 104.79 ( 162.10) 4.15e-03 4.15e-03 failed | |
300 192 192 192 5316.64 ( 3.20) 6021.67 ( 2.82) 104.88 ( 161.96) 4.21e-03 4.21e-03 failed | |
BLAS : Bad memory unallocation! : 1024 0x2001c0000000 | |
300 192 192 192 5308.33 ( 3.20) 6019.12 ( 2.82) 105.49 ( 161.04) 3.77e-17 3.79e-17 ok | |
300 224 224 224 5364.87 ( 5.03) 6120.96 ( 4.41) 108.84 ( 247.84) 3.47e-17 3.48e-17 ok | |
300 224 224 224 5365.89 ( 5.03) 6131.90 ( 4.40) 109.14 ( 247.15) 3.65e-03 3.65e-03 failed | |
300 224 224 224 5356.23 ( 5.04) 6119.63 ( 4.41) 105.82 ( 254.90) 3.56e-03 3.56e-03 failed | |
300 224 224 224 5359.53 ( 5.03) 6115.00 ( 4.41) 107.24 ( 251.54) 3.59e-03 3.59e-03 failed | |
300 224 224 224 5361.82 ( 5.03) 6150.24 ( 4.39) 106.17 ( 254.06) 3.57e-03 3.57e-03 failed | |
300 224 224 224 5355.22 ( 5.04) 6097.21 ( 4.42) 106.72 ( 252.75) 3.47e-17 3.48e-17 ok | |
300 256 256 256 5523.45 ( 7.29) 6235.83 ( 6.46) 110.07 ( 365.83) 3.09e-03 3.09e-03 failed | |
300 256 256 256 5528.69 ( 7.28) 6257.09 ( 6.44) 110.92 ( 363.02) 3.29e-17 3.31e-17 ok | |
300 256 256 256 5518.04 ( 7.30) 6233.07 ( 6.46) 106.58 ( 377.78) 3.29e-17 3.31e-17 ok | |
300 256 256 256 5521.82 ( 7.29) 6244.59 ( 6.45) 108.61 ( 370.75) 3.04e-03 3.04e-03 failed | |
300 256 256 256 5520.20 ( 7.29) 6264.98 ( 6.43) 107.01 ( 376.28) 3.13e-03 3.13e-03 failed | |
300 256 256 256 5512.63 ( 7.30) 6230.08 ( 6.46) 108.47 ( 371.23) 3.19e-03 3.19e-03 failed | |
300 288 288 288 5473.66 ( 10.47) 6282.18 ( 9.13) 107.79 ( 531.87) 2.83e-03 2.83e-03 failed | |
300 288 288 288 5476.27 ( 10.47) 6284.16 ( 9.12) 108.76 ( 527.14) 2.63e-03 2.63e-03 failed | |
300 288 288 288 5464.82 ( 10.49) 6269.90 ( 9.14) 105.60 ( 542.90) 2.79e-03 2.79e-03 failed | |
300 288 288 288 5472.54 ( 10.48) 6279.40 ( 9.13) 106.60 ( 537.84) 2.85e-03 2.85e-03 failed | |
300 288 288 288 5471.04 ( 10.48) 6292.38 ( 9.11) 105.10 ( 545.50) 2.72e-03 2.72e-03 failed | |
300 288 288 288 5462.71 ( 10.49) 6273.83 ( 9.14) 107.25 ( 534.54) 2.76e-03 2.76e-03 failed | |
300 320 320 320 5544.13 ( 14.18) 6359.11 ( 12.37) 105.25 ( 747.18) 2.47e-03 2.47e-03 failed | |
300 320 320 320 5548.79 ( 14.17) 6358.13 ( 12.37) 104.50 ( 752.57) 2.48e-03 2.48e-03 failed | |
300 320 320 320 5540.59 ( 14.19) 6346.88 ( 12.39) 98.42 ( 799.09) 2.49e-03 2.49e-03 failed | |
300 320 320 320 5545.25 ( 14.18) 6355.07 ( 12.37) 100.61 ( 781.67) 2.40e-03 2.40e-03 failed | |
300 320 320 320 5545.25 ( 14.18) 6353.97 ( 12.38) 96.49 ( 815.04) 3.71e-17 3.72e-17 ok | |
300 320 320 320 5537.43 ( 14.20) 6350.91 ( 12.38) 100.77 ( 780.39) 2.48e-03 2.48e-03 failed | |
300 352 352 352 5539.45 ( 18.90) 6378.26 ( 16.41) 112.68 ( 928.94) 2.24e-03 2.24e-03 failed | |
300 352 352 352 5541.83 ( 18.89) 6379.10 ( 16.41) 112.56 ( 929.93) 2.38e-03 2.38e-03 failed | |
300 352 352 352 5535.05 ( 18.91) 6381.79 ( 16.40) 108.43 ( 965.40) 3.68e-17 3.69e-17 ok | |
300 352 352 352 5540.99 ( 18.89) 6375.58 ( 16.42) 110.05 ( 951.17) 2.38e-03 2.38e-03 failed | |
300 352 352 352 5540.36 ( 18.89) 6377.15 ( 16.41) 108.74 ( 962.59) 2.16e-03 2.16e-03 failed | |
300 352 352 352 5536.52 ( 18.91) 6379.47 ( 16.41) 108.96 ( 960.69) 2.39e-03 2.39e-03 failed | |
300 384 384 384 5602.99 ( 24.25) 6428.69 ( 21.14) 112.32 (1209.92) 3.59e-17 3.60e-17 ok | |
300 384 384 384 5607.18 ( 24.24) 6429.27 ( 21.14) 112.33 (1209.76) 2.01e-03 2.01e-03 failed | |
300 384 384 384 5601.40 ( 24.26) 6429.27 ( 21.14) 108.05 (1257.73) 2.10e-03 2.10e-03 failed | |
300 384 384 384 5604.65 ( 24.25) 6423.47 ( 21.16) 109.83 (1237.36) 2.20e-03 2.20e-03 failed | |
300 384 384 384 5604.65 ( 24.25) 6424.12 ( 21.15) 108.37 (1254.02) 2.15e-03 2.15e-03 failed | |
300 384 384 384 5599.58 ( 24.27) 6429.85 ( 21.14) 109.87 (1236.88) 2.19e-03 2.19e-03 failed | |
300 416 416 416 5578.73 ( 30.97) 7089.29 ( 24.37) 111.67 (1547.21) 2.01e-03 2.01e-03 failed | |
300 416 416 416 5582.68 ( 30.95) 6797.24 ( 25.42) 110.59 (1562.39) 1.81e-03 1.81e-03 failed | |
300 416 416 416 5577.10 ( 30.98) 6434.24 ( 26.85) 106.28 (1625.70) 1.98e-03 1.98e-03 failed | |
300 416 416 416 5579.98 ( 30.96) 6435.89 ( 26.85) 107.43 (1608.36) 2.00e-03 2.00e-03 failed | |
300 416 416 416 5579.85 ( 30.96) 6434.98 ( 26.85) 106.94 (1615.70) 1.98e-03 1.98e-03 failed | |
300 416 416 416 5574.10 ( 31.00) 6432.35 ( 26.86) 107.71 (1604.14) 2.00e-03 2.00e-03 failed | |
300 448 448 448 5627.20 ( 38.35) 7172.15 ( 30.09) 109.33 (1973.78) 1.87e-03 1.87e-03 failed | |
300 448 448 448 5630.42 ( 38.33) 6684.52 ( 32.28) 106.37 (2028.69) 3.35e-17 3.36e-17 ok | |
300 448 448 448 5626.60 ( 38.35) 6465.27 ( 33.38) 100.83 (2140.11) 1.34e-03 1.34e-03 failed | |
300 448 448 448 5629.09 ( 38.34) 6466.24 ( 33.37) 102.59 (2103.56) 1.84e-03 1.84e-03 failed | |
300 448 448 448 5626.60 ( 38.35) 6462.91 ( 33.39) 101.61 (2123.77) 1.81e-03 1.81e-03 failed | |
300 448 448 448 5629.54 ( 38.33) 6758.44 ( 31.93) 99.76 (2163.23) 3.35e-17 3.36e-17 ok | |
300 480 480 480 5599.48 ( 47.40) 7494.55 ( 35.42) 101.01 (2627.56) 1.70e-03 1.70e-03 failed | |
300 480 480 480 5601.71 ( 47.38) 7494.60 ( 35.41) 104.56 (2538.43) 1.74e-03 1.74e-03 failed | |
300 480 480 480 5597.82 ( 47.42) 6463.62 ( 41.06) 98.91 (2683.57) 1.70e-03 1.70e-03 failed | |
300 480 480 480 5604.92 ( 47.35) 6616.83 ( 40.11) 96.11 (2761.58) 1.67e-03 1.67e-03 failed | |
300 480 480 480 5598.41 ( 47.41) 7348.27 ( 36.12) 95.84 (2769.36) 1.67e-03 1.67e-03 failed | |
300 480 480 480 5598.67 ( 47.41) 7126.93 ( 37.24) 101.27 (2620.93) 1.69e-03 1.69e-03 failed | |
300 512 512 512 6326.17 ( 50.92) 7453.45 ( 43.22) 93.24 (3454.75) 1.55e-03 1.55e-03 failed | |
300 512 512 512 6405.83 ( 50.29) 7456.17 ( 43.20) 95.88 (3359.76) 1.50e-03 1.50e-03 failed | |
https://jobstepviewer.olcf.ornl.gov/summit/228645-3 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment