Skip to content

Instantly share code, notes, and snippets.

@wyphan
Created July 17, 2020 16:51
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save wyphan/ff6f1875ec7b6388989737d89c10549a to your computer and use it in GitHub Desktop.
Save wyphan/ff6f1875ec7b6388989737d89c10549a to your computer and use it in GitHub Desktop.
% MAGMA 2.5.3 compiled for CUDA capability >= 7.0, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 10010, driver 10010. OpenMP threads 28.
% device 0: Tesla V100-SXM2-16GB, 1530.0 MHz clock, 16128.0 MiB memory, capability 7.0
% Fri Jul 17 00:52:55 2020
% Usage: ./testing/testing_zgemm_batched [options] [-h|--help]
% If running lapack (option --lapack), MAGMA and CUBLAS error are both computed
% relative to CPU BLAS result. Else, MAGMA error is computed relative to CUBLAS result.
% transA = No transpose, transB = No transpose
% version = 1, regular batch GEMM
% BatchCount M N K MAGMA Gflop/s (ms) CUBLAS Gflop/s (ms) CPU Gflop/s (ms) MAGMA error CUBLAS error
%========================================================================================================================
300 32 32 32 981.71 ( 0.08) 826.70 ( 0.10) 34.74 ( 2.26) 4.92e-17 4.91e-17 ok
BLAS : Bad memory unallocation! : 1024 0x2000c0000000
BLAS : Bad memory unallocation! : 1024 0x2000ec000000
300 32 32 32 1121.95 ( 0.07) 958.88 ( 0.08) 42.53 ( 1.85) 4.92e-17 4.91e-17 ok
BLAS : Bad memory unallocation! : 1024 0x2000cc000000
300 32 32 32 1074.44 ( 0.07) 913.72 ( 0.09) 39.09 ( 2.01) 4.92e-17 4.91e-17 ok
BLAS : Bad memory unallocation! : 1024 0x2000ec000000
300 32 32 32 1077.95 ( 0.07) 777.96 ( 0.10) 34.53 ( 2.28) 4.92e-17 4.91e-17 ok
BLAS : Bad memory unallocation! : 1024 0x2000b8000000
300 32 32 32 1092.23 ( 0.07) 958.88 ( 0.08) 37.08 ( 2.12) 4.92e-17 4.91e-17 ok
BLAS : Bad memory unallocation! : 1024 0x2000d8000000
300 32 32 32 1064.04 ( 0.07) 903.71 ( 0.09) 38.97 ( 2.02) 2.58e-02 2.58e-02 failed
300 64 64 64 3532.57 ( 0.18) 3453.96 ( 0.18) 91.27 ( 6.89) 1.28e-02 1.28e-02 failed
300 64 64 64 3614.83 ( 0.17) 3654.89 ( 0.17) 92.09 ( 6.83) 1.26e-02 1.26e-02 failed
300 64 64 64 3575.65 ( 0.18) 3383.11 ( 0.19) 91.96 ( 6.84) 1.34e-02 1.34e-02 failed
300 64 64 64 3556.37 ( 0.18) 3418.17 ( 0.18) 92.12 ( 6.83) 1.30e-02 1.30e-02 failed
300 64 64 64 3614.83 ( 0.17) 3327.65 ( 0.19) 92.18 ( 6.82) 1.26e-02 1.26e-02 failed
300 64 64 64 3570.81 ( 0.18) 3327.65 ( 0.19) 92.36 ( 6.81) 1.32e-02 1.32e-02 failed
300 96 96 96 4498.00 ( 0.47) 4772.80 ( 0.44) 99.60 ( 21.32) 8.80e-03 8.80e-03 failed
300 96 96 96 4527.73 ( 0.47) 4904.21 ( 0.43) 99.08 ( 21.43) 8.65e-03 8.65e-03 failed
300 96 96 96 4498.00 ( 0.47) 4687.39 ( 0.45) 98.08 ( 21.65) 8.75e-03 8.75e-03 failed
300 96 96 96 4498.00 ( 0.47) 4655.54 ( 0.46) 101.29 ( 20.96) 8.54e-03 8.54e-03 failed
300 96 96 96 4507.11 ( 0.47) 4687.39 ( 0.45) 100.78 ( 21.07) 8.34e-03 8.34e-03 failed
300 96 96 96 4500.27 ( 0.47) 4687.39 ( 0.45) 101.13 ( 21.00) 8.51e-03 8.51e-03 failed
300 128 128 128 5013.21 ( 1.00) 5459.17 ( 0.92) 103.99 ( 48.40) 6.45e-03 6.45e-03 failed
300 128 128 128 5028.73 ( 1.00) 5542.30 ( 0.91) 99.55 ( 50.56) 6.43e-03 6.43e-03 failed
300 128 128 128 5022.75 ( 1.00) 5605.58 ( 0.90) 98.75 ( 50.97) 6.25e-03 6.25e-03 failed
300 128 128 128 5003.70 ( 1.01) 5193.27 ( 0.97) 104.20 ( 48.30) 6.38e-03 6.38e-03 failed
300 128 128 128 5003.70 ( 1.01) 5464.83 ( 0.92) 98.63 ( 51.03) 6.22e-03 6.22e-03 failed
300 128 128 128 4993.05 ( 1.01) 5476.17 ( 0.92) 102.89 ( 48.92) 6.46e-03 6.46e-03 failed
300 160 160 160 5135.98 ( 1.91) 5779.60 ( 1.70) 103.79 ( 94.71) 4.86e-03 4.86e-03 failed
300 160 160 160 5143.67 ( 1.91) 5810.55 ( 1.69) 102.89 ( 95.54) 4.98e-03 4.98e-03 failed
300 160 160 160 5141.11 ( 1.91) 5755.40 ( 1.71) 102.66 ( 95.76) 5.08e-03 5.08e-03 failed
300 160 160 160 5144.32 ( 1.91) 5775.55 ( 1.70) 102.25 ( 96.14) 5.19e-03 5.19e-03 failed
BLAS : Bad memory unallocation! : 1024 0x200148a40000
BLAS : Bad memory unallocation! : 1024 0x200160a40000
300 160 160 160 5138.55 ( 1.91) 5772.32 ( 1.70) 102.69 ( 95.73) 4.01e-17 4.03e-17 ok
BLAS : Bad memory unallocation! : 1024 0x20012ca40000
BLAS : Bad memory unallocation! : 1024 0x200130a40000
300 160 160 160 5144.32 ( 1.91) 5803.19 ( 1.69) 102.79 ( 95.63) 5.00e-03 5.00e-03 failed
300 192 192 192 5313.47 ( 3.20) 6047.22 ( 2.81) 107.40 ( 158.16) 4.22e-03 4.22e-03 failed
300 192 192 192 5316.64 ( 3.20) 6030.33 ( 2.82) 107.29 ( 158.32) 3.77e-17 3.79e-17 ok
300 192 192 192 5311.49 ( 3.20) 6017.09 ( 2.82) 105.75 ( 160.64) 4.26e-03 4.26e-03 failed
300 192 192 192 5311.89 ( 3.20) 6030.33 ( 2.82) 104.79 ( 162.10) 4.15e-03 4.15e-03 failed
300 192 192 192 5316.64 ( 3.20) 6021.67 ( 2.82) 104.88 ( 161.96) 4.21e-03 4.21e-03 failed
BLAS : Bad memory unallocation! : 1024 0x2001c0000000
300 192 192 192 5308.33 ( 3.20) 6019.12 ( 2.82) 105.49 ( 161.04) 3.77e-17 3.79e-17 ok
300 224 224 224 5364.87 ( 5.03) 6120.96 ( 4.41) 108.84 ( 247.84) 3.47e-17 3.48e-17 ok
300 224 224 224 5365.89 ( 5.03) 6131.90 ( 4.40) 109.14 ( 247.15) 3.65e-03 3.65e-03 failed
300 224 224 224 5356.23 ( 5.04) 6119.63 ( 4.41) 105.82 ( 254.90) 3.56e-03 3.56e-03 failed
300 224 224 224 5359.53 ( 5.03) 6115.00 ( 4.41) 107.24 ( 251.54) 3.59e-03 3.59e-03 failed
300 224 224 224 5361.82 ( 5.03) 6150.24 ( 4.39) 106.17 ( 254.06) 3.57e-03 3.57e-03 failed
300 224 224 224 5355.22 ( 5.04) 6097.21 ( 4.42) 106.72 ( 252.75) 3.47e-17 3.48e-17 ok
300 256 256 256 5523.45 ( 7.29) 6235.83 ( 6.46) 110.07 ( 365.83) 3.09e-03 3.09e-03 failed
300 256 256 256 5528.69 ( 7.28) 6257.09 ( 6.44) 110.92 ( 363.02) 3.29e-17 3.31e-17 ok
300 256 256 256 5518.04 ( 7.30) 6233.07 ( 6.46) 106.58 ( 377.78) 3.29e-17 3.31e-17 ok
300 256 256 256 5521.82 ( 7.29) 6244.59 ( 6.45) 108.61 ( 370.75) 3.04e-03 3.04e-03 failed
300 256 256 256 5520.20 ( 7.29) 6264.98 ( 6.43) 107.01 ( 376.28) 3.13e-03 3.13e-03 failed
300 256 256 256 5512.63 ( 7.30) 6230.08 ( 6.46) 108.47 ( 371.23) 3.19e-03 3.19e-03 failed
300 288 288 288 5473.66 ( 10.47) 6282.18 ( 9.13) 107.79 ( 531.87) 2.83e-03 2.83e-03 failed
300 288 288 288 5476.27 ( 10.47) 6284.16 ( 9.12) 108.76 ( 527.14) 2.63e-03 2.63e-03 failed
300 288 288 288 5464.82 ( 10.49) 6269.90 ( 9.14) 105.60 ( 542.90) 2.79e-03 2.79e-03 failed
300 288 288 288 5472.54 ( 10.48) 6279.40 ( 9.13) 106.60 ( 537.84) 2.85e-03 2.85e-03 failed
300 288 288 288 5471.04 ( 10.48) 6292.38 ( 9.11) 105.10 ( 545.50) 2.72e-03 2.72e-03 failed
300 288 288 288 5462.71 ( 10.49) 6273.83 ( 9.14) 107.25 ( 534.54) 2.76e-03 2.76e-03 failed
300 320 320 320 5544.13 ( 14.18) 6359.11 ( 12.37) 105.25 ( 747.18) 2.47e-03 2.47e-03 failed
300 320 320 320 5548.79 ( 14.17) 6358.13 ( 12.37) 104.50 ( 752.57) 2.48e-03 2.48e-03 failed
300 320 320 320 5540.59 ( 14.19) 6346.88 ( 12.39) 98.42 ( 799.09) 2.49e-03 2.49e-03 failed
300 320 320 320 5545.25 ( 14.18) 6355.07 ( 12.37) 100.61 ( 781.67) 2.40e-03 2.40e-03 failed
300 320 320 320 5545.25 ( 14.18) 6353.97 ( 12.38) 96.49 ( 815.04) 3.71e-17 3.72e-17 ok
300 320 320 320 5537.43 ( 14.20) 6350.91 ( 12.38) 100.77 ( 780.39) 2.48e-03 2.48e-03 failed
300 352 352 352 5539.45 ( 18.90) 6378.26 ( 16.41) 112.68 ( 928.94) 2.24e-03 2.24e-03 failed
300 352 352 352 5541.83 ( 18.89) 6379.10 ( 16.41) 112.56 ( 929.93) 2.38e-03 2.38e-03 failed
300 352 352 352 5535.05 ( 18.91) 6381.79 ( 16.40) 108.43 ( 965.40) 3.68e-17 3.69e-17 ok
300 352 352 352 5540.99 ( 18.89) 6375.58 ( 16.42) 110.05 ( 951.17) 2.38e-03 2.38e-03 failed
300 352 352 352 5540.36 ( 18.89) 6377.15 ( 16.41) 108.74 ( 962.59) 2.16e-03 2.16e-03 failed
300 352 352 352 5536.52 ( 18.91) 6379.47 ( 16.41) 108.96 ( 960.69) 2.39e-03 2.39e-03 failed
300 384 384 384 5602.99 ( 24.25) 6428.69 ( 21.14) 112.32 (1209.92) 3.59e-17 3.60e-17 ok
300 384 384 384 5607.18 ( 24.24) 6429.27 ( 21.14) 112.33 (1209.76) 2.01e-03 2.01e-03 failed
300 384 384 384 5601.40 ( 24.26) 6429.27 ( 21.14) 108.05 (1257.73) 2.10e-03 2.10e-03 failed
300 384 384 384 5604.65 ( 24.25) 6423.47 ( 21.16) 109.83 (1237.36) 2.20e-03 2.20e-03 failed
300 384 384 384 5604.65 ( 24.25) 6424.12 ( 21.15) 108.37 (1254.02) 2.15e-03 2.15e-03 failed
300 384 384 384 5599.58 ( 24.27) 6429.85 ( 21.14) 109.87 (1236.88) 2.19e-03 2.19e-03 failed
300 416 416 416 5578.73 ( 30.97) 7089.29 ( 24.37) 111.67 (1547.21) 2.01e-03 2.01e-03 failed
300 416 416 416 5582.68 ( 30.95) 6797.24 ( 25.42) 110.59 (1562.39) 1.81e-03 1.81e-03 failed
300 416 416 416 5577.10 ( 30.98) 6434.24 ( 26.85) 106.28 (1625.70) 1.98e-03 1.98e-03 failed
300 416 416 416 5579.98 ( 30.96) 6435.89 ( 26.85) 107.43 (1608.36) 2.00e-03 2.00e-03 failed
300 416 416 416 5579.85 ( 30.96) 6434.98 ( 26.85) 106.94 (1615.70) 1.98e-03 1.98e-03 failed
300 416 416 416 5574.10 ( 31.00) 6432.35 ( 26.86) 107.71 (1604.14) 2.00e-03 2.00e-03 failed
300 448 448 448 5627.20 ( 38.35) 7172.15 ( 30.09) 109.33 (1973.78) 1.87e-03 1.87e-03 failed
300 448 448 448 5630.42 ( 38.33) 6684.52 ( 32.28) 106.37 (2028.69) 3.35e-17 3.36e-17 ok
300 448 448 448 5626.60 ( 38.35) 6465.27 ( 33.38) 100.83 (2140.11) 1.34e-03 1.34e-03 failed
300 448 448 448 5629.09 ( 38.34) 6466.24 ( 33.37) 102.59 (2103.56) 1.84e-03 1.84e-03 failed
300 448 448 448 5626.60 ( 38.35) 6462.91 ( 33.39) 101.61 (2123.77) 1.81e-03 1.81e-03 failed
300 448 448 448 5629.54 ( 38.33) 6758.44 ( 31.93) 99.76 (2163.23) 3.35e-17 3.36e-17 ok
300 480 480 480 5599.48 ( 47.40) 7494.55 ( 35.42) 101.01 (2627.56) 1.70e-03 1.70e-03 failed
300 480 480 480 5601.71 ( 47.38) 7494.60 ( 35.41) 104.56 (2538.43) 1.74e-03 1.74e-03 failed
300 480 480 480 5597.82 ( 47.42) 6463.62 ( 41.06) 98.91 (2683.57) 1.70e-03 1.70e-03 failed
300 480 480 480 5604.92 ( 47.35) 6616.83 ( 40.11) 96.11 (2761.58) 1.67e-03 1.67e-03 failed
300 480 480 480 5598.41 ( 47.41) 7348.27 ( 36.12) 95.84 (2769.36) 1.67e-03 1.67e-03 failed
300 480 480 480 5598.67 ( 47.41) 7126.93 ( 37.24) 101.27 (2620.93) 1.69e-03 1.69e-03 failed
300 512 512 512 6326.17 ( 50.92) 7453.45 ( 43.22) 93.24 (3454.75) 1.55e-03 1.55e-03 failed
300 512 512 512 6405.83 ( 50.29) 7456.17 ( 43.20) 95.88 (3359.76) 1.50e-03 1.50e-03 failed
https://jobstepviewer.olcf.ornl.gov/summit/228645-3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment