Skip to content

Instantly share code, notes, and snippets.

@foxtran
Created January 17, 2023 14:17
Show Gist options
  • Save foxtran/12411975088da43411cff7bdfbf1d5d9 to your computer and use it in GitHub Desktop.
Save foxtran/12411975088da43411cff7bdfbf1d5d9 to your computer and use it in GitHub Desktop.

Benchmark was compiled using the following compiler:

GCC version 12.2.0

Benchmark was compiled with the following options:

-mabi=lp64d -mcpu=sifive-u74 -misa-spec=20191213 -march=rv64imafdc_zicsr -O3 -ffree-line-length-none -fpre-include=/usr/include/finclude/riscv64-linux-gnu/math-vector-fortran.h

Number of repeats is: 100000

Total size of one array is 4Kb

type test N mean time, ms sd time, ms min time, ms max time, ms
int8 add_v1 10 153.623 0.070 153.585 153.831
int8 add_v2 10 222.329 0.070 222.290 222.537
int8 add_v3 10 175.254 0.108 175.043 175.383
int8 mul_v1 10 514.244 0.513 513.644 515.283
int8 mul_v2 10 1027.292 0.669 1026.713 1028.577
int8 mul_v3 10 1097.671 1.514 1095.844 1101.248
int8 fma_v1 10 836.633 0.223 836.530 837.298
int8 fma_v2 10 696.603 0.160 696.508 697.027
int8 fma_v3 10 722.701 0.175 722.623 723.222
int8 fma_v4 10 1141.211 0.613 1140.448 1142.322
int8 div_v1 10 1371.889 0.223 1371.764 1372.546
int8 div_v2 10 2734.567 0.479 2734.318 2735.993
int8 inv_v1 10 1771.318 0.296 1770.982 1771.991
int8 inv_v2.1 10 1368.186 0.242 1368.032 1368.866
int8 inv_v2.2 10 5001.032 11.752 4990.418 5031.006
int8 inv_v2.3 10 1026.454 0.178 1026.363 1026.968
int8 popcnt 10 11336.401 2.454 11333.590 11342.272
int8 poppar 10 10148.028 5.003 10144.480 10160.036
int8 dim 10 1710.394 0.320 1710.235 1711.343
int8 iand 10 129.429 0.031 129.410 129.522
int8 ieor 10 128.573 0.404 128.016 129.032
int8 ior 10 129.424 0.017 129.411 129.470
int8 ishft 10 2735.412 0.490 2735.193 2736.867
int8 ishftc 10 4178.975 0.837 4178.482 4181.288
int8 ibset 10 1028.592 1.197 1026.929 1030.977
int8 ibclr 10 1368.169 0.209 1368.046 1368.784
int8 min 10 1711.603 0.818 1710.279 1712.364
int8 max 10 2184.484 0.357 2184.318 2185.546
int8 shifta 10 1027.106 0.166 1026.979 1027.591
int8 shiftl 10 1368.295 0.231 1368.135 1368.857
int8 shiftr 10 1368.238 0.176 1368.145 1368.753
int16 add_v1 10 153.547 0.019 153.531 153.598
int16 add_v2 10 223.221 2.938 222.207 232.034
int16 add_v3 10 175.425 0.030 175.361 175.475
int16 mul_v1 10 258.956 2.114 257.336 264.286
int16 mul_v2 10 514.198 0.101 514.147 514.498
int16 mul_v3 10 548.103 0.497 546.967 548.394
int16 fma_v1 10 418.828 0.057 418.788 418.987
int16 fma_v2 10 343.649 0.428 342.771 344.269
int16 fma_v3 10 428.214 0.101 428.166 428.514
int16 fma_v4 10 571.374 0.602 570.792 572.537
int16 div_v1 10 684.410 0.094 684.364 684.688
int16 div_v2 10 1995.365 0.386 1995.090 1996.265
int16 inv_v1 10 686.080 0.094 686.029 686.355
int16 inv_v2.1 10 513.722 0.243 513.626 514.450
int16 inv_v2.2 10 2431.707 1.733 2429.616 2435.341
int16 inv_v2.3 10 513.766 0.085 513.718 514.010
int16 popcnt 10 5656.232 1.079 5655.086 5659.229
int16 poppar 10 5099.266 1.540 5098.206 5103.643
int16 dim 10 856.004 0.150 855.935 856.448
int16 iand 10 129.430 0.024 129.412 129.497
int16 ieor 10 128.949 0.467 128.022 129.504
int16 ior 10 129.428 0.021 129.416 129.491
int16 ishft 10 1529.403 0.242 1528.964 1529.861
int16 ishftc 10 2450.365 0.429 2450.157 2451.643
int16 ibset 10 514.097 0.106 513.996 514.403
int16 ibclr 10 684.592 0.111 684.517 684.890
int16 min 10 759.451 0.153 759.349 759.888
int16 max 10 1028.208 0.862 1027.335 1030.200
int16 shifta 10 522.585 22.980 514.181 591.289
int16 shiftl 10 684.576 0.103 684.523 684.883
int16 shiftr 10 684.681 0.112 684.623 685.008
int32 add_v1 10 130.463 1.217 128.842 131.917
int32 add_v2 10 256.977 0.608 255.415 257.676
int32 add_v3 10 172.521 0.357 172.298 173.510
int32 mul_v1 10 129.252 0.276 129.134 130.066
int32 mul_v2 10 258.665 1.217 257.751 261.464
int32 mul_v3 10 274.578 0.137 274.374 274.872
int32 fma_v1 10 139.899 0.076 139.854 140.107
int32 fma_v2 10 187.910 0.160 187.609 188.122
int32 fma_v3 10 180.616 0.059 180.587 180.789
int32 fma_v4 10 258.068 0.113 258.013 258.397
int32 div_v1 10 342.689 0.058 342.655 342.852
int32 div_v2 10 1641.229 0.229 1641.073 1641.882
int32 inv_v1 10 257.587 0.038 257.552 257.692
int32 inv_v2.1 10 258.952 0.563 257.777 259.869
int32 inv_v2.2 10 1240.351 17.250 1217.119 1269.617
int32 inv_v2.3 10 257.809 0.464 257.492 258.593
int32 popcnt 10 2827.541 2.071 2824.296 2831.226
int32 poppar 10 2557.430 0.591 2556.995 2559.037
int32 dim 10 342.822 0.108 342.774 343.147
int32 iand 10 129.518 0.048 129.493 129.662
int32 ieor 10 128.920 0.722 127.495 129.420
int32 ior 10 129.501 0.018 129.487 129.550
int32 ishft 10 514.127 0.061 514.095 514.308
int32 ishftc 10 713.394 0.103 713.336 713.697
int32 ibset 10 258.871 2.018 257.587 264.594
int32 ibclr 10 343.056 0.056 343.029 343.222
int32 min 10 343.278 0.223 342.993 343.660
int32 max 10 342.967 0.044 342.941 343.095
int32 shifta 10 257.734 0.040 257.710 257.852
int32 shiftl 10 257.525 0.051 257.501 257.676
int32 shiftr 10 257.600 0.028 257.584 257.683
int64 add_v1 10 64.117 0.397 63.429 64.721
int64 add_v2 10 129.526 0.013 129.509 129.560
int64 add_v3 10 86.791 0.017 86.760 86.823
int64 mul_v1 10 66.314 1.678 65.139 70.996
int64 mul_v2 10 129.131 0.517 128.196 129.691
int64 mul_v3 10 138.352 0.172 138.223 138.835
int64 fma_v1 10 76.263 0.159 75.946 76.423
int64 fma_v2 10 92.544 0.289 91.719 92.721
int64 fma_v3 10 90.906 0.010 90.895 90.930
int64 fma_v4 10 143.757 0.348 143.276 144.385
int64 div_v1 10 171.832 0.021 171.816 171.891
int64 div_v2 10 794.554 0.144 794.467 794.975
int64 inv_v1 10 131.711 0.489 131.249 132.702
int64 inv_v2.1 10 129.175 0.013 129.166 129.207
int64 inv_v2.2 10 617.332 12.769 607.231 649.556
int64 inv_v2.3 10 126.735 1.143 124.873 128.870
int64 popcnt 10 1456.187 0.236 1455.984 1456.868
int64 poppar 10 1282.824 0.178 1282.697 1283.266
int64 dim 10 172.001 0.031 171.973 172.089
int64 iand 10 129.521 0.014 129.508 129.559
int64 ieor 10 128.903 0.477 128.209 129.443
int64 ior 10 129.515 0.020 129.494 129.561
int64 ishft 10 257.797 0.032 257.778 257.891
int64 ishftc 10 300.801 0.050 300.774 300.947
int64 ibset 10 129.532 0.016 129.520 129.570
int64 ibclr 10 172.169 0.016 172.158 172.214
int64 min 10 132.236 1.432 129.522 135.178
int64 max 10 129.251 0.190 128.833 129.515
int64 shifta 10 131.296 1.400 129.613 133.560
int64 shiftl 10 129.447 0.015 129.437 129.489
int64 shiftr 10 131.436 1.181 129.770 133.705
real32 add_v1 10 129.897 0.784 129.044 131.092
real32 add_v2 10 258.287 0.661 257.769 259.772
real32 add_v3 10 172.411 0.023 172.399 172.475
real32 mul_v1 10 129.770 0.624 129.076 131.058
real32 mul_v2 10 257.786 0.060 257.756 257.967
real32 mul_v3 10 172.413 0.019 172.399 172.466
real32 fma_v1 10 87.278 1.299 86.459 90.420
real32 fma_v2 10 129.473 0.371 129.244 130.283
real32 fma_v3 10 171.687 0.028 171.671 171.768
real32 fma_v4 10 258.604 0.584 257.943 259.885
real32 div_v1 10 349.240 0.170 349.134 349.694
real32 div_v2 10 1540.427 0.342 1539.959 1540.724
real32 inv 10 1539.939 0.276 1539.754 1540.527
real32 invsqrt_v1 10 3078.070 0.500 3077.772 3079.308
real32 invsqrt_v2 10 3077.442 0.594 3077.049 3078.632
real32 exp 10 5300.263 1.122 5299.405 5302.768
real32 erf 10 7853.213 5.885 7844.332 7868.912
real32 erfc 10 7893.178 36.852 7843.006 7965.351
real32 erfc_scaled 10 9305.416 1.200 9303.557 9308.050
real32 gamma 10 33523.623 6.967 33512.820 33532.793
real32 sqrt 10 1539.265 0.276 1539.077 1539.830
real32 sin 10 4626.723 2.560 4621.915 4630.551
real32 cos 10 4368.755 1.505 4367.488 4371.816
real32 tan 10 10296.860 3.101 10292.547 10302.197
real32 sinh 10 21644.525 31.637 21571.459 21686.025
real32 cosh 10 15962.167 76.744 15881.037 16104.113
real32 tanh 10 16127.428 3.064 16123.026 16134.531
real32 asinh 10 18528.304 3.490 18525.718 18536.709
real32 acosh 10 7436.016 1.008 7434.876 7438.454
real32 atan 10 8896.491 5.874 8885.258 8907.582
real32 bessel_j0 10 7265.836 1.247 7264.935 7268.350
real32 bessel_j1 10 7351.493 1.271 7350.553 7353.989
real32 bessel_y0 10 20581.906 3.702 20579.325 20589.783
real32 bessel_y1 10 17202.023 22.433 17182.470 17249.977
real32 epsilon 10 101.791 1.091 99.776 102.950
real32 exponent 10 7901.891 4.244 7892.663 7907.701
real32 fraction 10 7352.426 1.207 7351.584 7355.007
real32 log 10 4445.653 0.738 4445.118 4447.100
real32 log10 10 10742.656 24.487 10704.716 10772.856
real32 log_gamma 10 16483.210 27.811 16420.842 16505.848
real32 atan2 10 14831.328 18.764 14813.580 14885.153
real32 dim 10 1061.581 0.207 1061.349 1062.103
real64 add_v1 10 65.337 0.573 64.930 66.776
real64 add_v2 10 129.682 0.048 129.637 129.818
real64 add_v3 10 87.348 0.516 86.954 88.391
real64 mul_v1 10 67.184 1.266 65.978 70.470
real64 mul_v2 10 129.658 0.025 129.631 129.712
real64 mul_v3 10 87.144 0.782 86.867 89.489
real64 fma_v1 10 44.159 0.482 43.724 45.191
real64 fma_v2 10 65.639 0.588 65.176 67.185
real64 fma_v3 10 85.022 0.785 83.658 86.392
real64 fma_v4 10 129.752 0.026 129.710 129.812
real64 div_v1 10 221.254 0.063 221.217 221.381
real64 div_v2 10 1369.297 0.312 1369.078 1369.845
real64 inv 10 1369.081 0.375 1368.878 1370.108
real64 invsqrt_v1 10 2736.408 0.557 2736.038 2737.511
real64 invsqrt_v2 10 2736.420 0.666 2735.981 2738.095
real64 exp 10 2864.796 0.499 2864.485 2865.798
real64 erf 10 4110.702 0.689 4110.316 4112.114
real64 erfc 10 4311.488 3.031 4306.905 4317.320
real64 erfc_scaled 10 5665.111 70.243 5581.902 5792.760
real64 gamma 10 18114.309 15.823 18101.857 18148.838
real64 sqrt 10 1369.090 0.355 1368.842 1369.906
real64 sin 10 5039.960 4.158 5033.072 5045.444
real64 cos 10 4914.102 3.195 4910.977 4920.129
real64 tan 10 6670.155 2.032 6666.124 6673.778
real64 sinh 10 11374.259 8.915 11365.052 11398.466
real64 cosh 10 8887.259 18.453 8856.637 8911.999
real64 tanh 10 8912.415 1.586 8911.281 8915.644
real64 asinh 10 11511.310 2.300 11509.495 11516.049
real64 acosh 10 3796.538 6.683 3793.521 3816.518
real64 atan 10 5376.516 7.588 5358.369 5384.992
real64 bessel_j0 10 3890.201 0.664 3889.794 3891.534
real64 bessel_j1 10 4396.954 0.699 4396.495 4398.445
real64 bessel_y0 10 9183.919 2.013 9182.342 9188.342
real64 bessel_y1 10 10423.996 1.784 10422.783 10427.769
real64 epsilon 10 63.147 1.666 58.782 64.848
real64 exponent 10 3772.476 15.891 3762.855 3806.712
real64 fraction 10 3676.515 0.789 3676.023 3678.423
real64 log 10 2770.197 1.122 2768.378 2772.899
real64 log10 10 6190.723 1.190 6189.707 6193.234
real64 log_gamma 10 9360.925 48.928 9287.911 9457.164
real64 atan2 10 9641.222 2.411 9638.666 9646.713
real64 dim 10 431.037 1.184 428.870 432.437
real128 add_v1 10 2572.999 0.615 2572.580 2574.220
real128 add_v2 10 2359.096 7.123 2345.890 2371.500
real128 add_v3 10 2545.483 1.050 2544.390 2547.164
real128 mul_v1 10 3661.156 1.422 3659.028 3663.457
real128 mul_v2 10 4537.952 4.331 4533.445 4545.203
real128 mul_v3 10 4419.658 1.717 4417.985 4423.371
real128 fma_v1 10 3847.670 31.312 3824.748 3939.963
real128 fma_v2 10 3566.094 13.344 3545.872 3589.204
real128 fma_v3 10 3594.092 11.933 3579.225 3615.364
real128 fma_v4 10 7180.475 4.203 7174.738 7187.750
real128 div_v1 10 3511.294 1.380 3509.205 3513.241
real128 div_v2 10 6631.621 14.891 6606.079 6656.592
real128 inv 10 6464.231 8.707 6449.684 6476.967
real128 invsqrt_v1 10 36695.954 15.962 36667.671 36711.327
real128 invsqrt_v2 10 36657.611 23.678 36626.306 36703.339
real128 exp 10 110897.535 55.584 110823.554 111004.716
real128 erf 10 122718.113 44.801 122639.974 122797.710
real128 erfc 10 123980.064 49.722 123886.961 124070.071
real128 erfc_scaled 10 248129.509 197.744 247849.121 248487.314
real128 gamma 10 318984.892 115.750 318807.834 319154.934
real128 sqrt 10 30301.542 6.534 30292.787 30312.392
real128 sin 10 89693.882 25.696 89671.015 89746.480
real128 cos 10 90196.739 23.853 90170.153 90240.209
real128 tan 10 102642.536 43.417 102542.012 102711.220
real128 sinh 10 180602.218 103.050 180407.136 180766.920
real128 cosh 10 140875.658 82.351 140730.714 141007.893
real128 tanh 10 172439.437 135.995 172244.410 172679.350
real128 asinh 10 275354.799 141.237 275153.995 275551.809
real128 acosh 10 5888.149 20.292 5878.027 5947.777
real128 atan 10 101901.919 55.486 101819.270 102007.729
real128 bessel_j0 10 113094.780 27.468 113060.377 113151.097
real128 bessel_j1 10 110535.568 22.747 110503.482 110575.680
real128 bessel_y0 10 365239.091 108.955 365085.090 365464.932
real128 bessel_y1 10 377441.004 77.045 377273.822 377578.746
real128 epsilon 10 60.597 1.172 58.376 62.628
real128 exponent 10 4356.117 9.379 4348.133 4370.979
real128 fraction 10 3117.374 17.714 3100.631 3161.191
real128 log 10 130061.249 61.698 129983.039 130155.063
real128 log10 10 212175.955 55.417 212075.324 212257.607
real128 log_gamma 10 258037.801 96.843 257869.755 258155.213
real128 atan2 10 108760.791 41.355 108701.835 108833.882
real128 dim 10 4001.987 5.990 3994.511 4013.344
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment