Skip to content

Instantly share code, notes, and snippets.

@olajep
Last active August 29, 2015 14:27
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save olajep/bbd62ca934aae8134138 to your computer and use it in GitHub Desktop.
Save olajep/bbd62ca934aae8134138 to your computer and use it in GitHub Desktop.
pal cflags benchmark 20150812
PAL Benchmark
=============
Comparison between different CFLAGS
Date: 2015-08-12
Commit: d88cd210f7774ba4472a494b8bd919657db8d36c
Arch: x86_64
CPU: Intel(R) Core(TM) i7-2760QM
CFLAGS: "-O2 -ffast-math" vs.
"-O3 -ffast-math -march=corei7-avx"
Speedup
-----------------------
0.24 p_tan_f32
0.36 p_exp_f32
0.47 p_min_f32
0.48 p_cosh_f32
0.50 p_sinh_f32
0.50 p_tanh_f32
0.88 p_gauss3x3_f32
0.97 p_rand
0.97 p_asinh_f32
0.99 p_cbrt_f32
0.99 p_median3x3_f32
0.99 p_laplace3x3_f32
0.99 p_max_f32
0.99 p_atanh_f32
1.00 p_acosh_f32
1.00 p_invcbrt_f32
1.01 p_median_f32
1.02 p_atan2_f32
1.07 p_sobel3x3_f32
1.10 p_box3x3_f32
1.13 p_prewitt3x3_f32
1.19 p_scharr3x3_f32
1.32 p_sort_f32
1.41 p_mode_f32
1.82 p_popcount_u64
1.93 p_pow_f32
2.42 p_add_f32
2.43 p_mul_f32
2.66 p_asin_f32
2.78 p_acos_f32
2.95 p_sub_f32
3.20 p_mac_f32
3.90 p_abs_f32
4.18 p_dot_f32
4.37 p_absdiff_f32
4.63 p_invsqrt_f32
4.71 p_ln_f32
4.71 p_popcount_u32
4.72 p_log10_f32
5.46 p_itof
5.47 p_sqrt_f32
5.57 p_sumsq_f32
6.84 p_ftoi
7.15 p_inv_f32
7.25 p_mean_f32
7.48 p_atan_f32
7.65 p_sum_f32
7.73 p_cos_f32
8.07 p_sincos_f32
8.28 p_sin_f32
13.00 p_div_f32
---------------------
2.07 (Geometric mean)
Arch: ARMV7
CPU: Cortex-A9
CFLAGS: "-O2 -ffast-math" vs.
"-O3 -ffast-math -mcpu=cortex-a9 -mfpu=neon"
Speedup
-------------------------
0.29 p_tan_f32
0.71 p_exp_f32
0.76 p_sinh_f32
0.76 p_cosh_f32
0.80 p_tanh_f32
0.94 p_laplace3x3_f32
0.95 p_median3x3_f32
0.99 p_gauss3x3_f32
0.99 p_scharr3x3_f32
0.99 p_div_f32
1.00 p_cbrt_f32
1.01 p_sobel3x3_f32
1.01 p_prewitt3x3_f32
1.01 p_box3x3_f32
1.01 p_atan2_f32
1.01 p_invcbrt_f32
1.02 p_asinh_f32
1.03 p_acosh_f32
1.05 p_atanh_f32
1.06 p_rand
1.08 p_median_f32
1.09 p_max_f32
1.11 p_pow_f32
1.13 p_ln_f32
1.16 p_abs_f32
1.22 p_log10_f32
1.25 p_popcount_u64
1.35 p_sort_f32
1.36 p_mode_f32
1.36 p_acos_f32
1.44 p_asin_f32
1.49 p_min_f32
1.53 p_popcount_u32
1.62 p_sub_f32
1.62 p_add_f32
1.66 p_absdiff_f32
1.75 p_mul_f32
1.81 p_mac_f32
1.99 p_dot_f32
2.16 p_itof
2.32 p_ftoi
2.66 p_mean_f32
2.66 p_sumsq_f32
2.66 p_sum_f32
3.04 p_inv_f32
3.29 p_sin_f32
3.38 p_sqrt_f32
3.42 p_sincos_f32
3.52 p_cos_f32
3.65 p_invsqrt_f32
3.66 p_atan_f32
---------------------
1.40 (Geometric mean)
Arch: Epiphany
CPU: Epiphany-III
CFLAGS: "-O2 -ffast-math-mfp-mode=round-nearest -ffp-contract=fast" vs.
"-O2 -ffast-math-mfp-mode=round-nearest -ffp-contract=fast -mno-soft-cmpsf"
Speedup
----------------------------
0.94 p_max_f32
1 p_abs_f32
1 p_absdiff_f32
1 p_acosh_f32
1 p_add_f32
1 p_asinh_f32
1 p_atan2_f32
1 p_atan_f32
1 p_atanh_f32
1 p_box3x3_f32
1 p_cbrt_f32
1 p_cos_f32
1 p_div_f32
1 p_dot_f32
1 p_ftoi
1 p_gauss3x3_f32
1 p_inv_f32
1 p_invcbrt_f32
1 p_invsqrt_f32
1 p_itof
1 p_laplace3x3_f32
1 p_ln_f32
1 p_log10_f32
1 p_mac_f32
1 p_mean_f32
1 p_mul_f32
1 p_popcount_u32
1 p_popcount_u64
1 p_prewitt3x3_f32
1 p_rand
1 p_scharr3x3_f32
1 p_sin_f32
1 p_sincos_f32
1 p_sobel3x3_f32
1 p_sqrt_f32
1 p_sub_f32
1 p_sum_f32
1 p_sumsq_f32
1.08 p_tanh_f32
1.09 p_cosh_f32
1.09 p_sinh_f32
1.13 p_exp_f32
1.30 p_asin_f32
1.31 p_acos_f32
1.58 p_tan_f32
1.96 p_sort_f32
2.01 p_median_f32
2.18 p_median3x3_f32
3.89 p_min_f32
---------------------
1.10 (Geometric mean)
Arch: Epiphany
CPU: Epiphany-III
CFLAGS: "-O2 -ffast-math-mfp-mode=round-nearest -ffp-contract=fast" vs.
"-O3 -ffast-math-mfp-mode=round-nearest -ffp-contract=fast -mno-soft-cmpsf"
Speedup
----------------------------
0.29 p_tan_f32
0.63 p_exp_f32
0.68 p_tanh_f32
0.69 p_cosh_f32
0.69 p_sinh_f32
0.94 p_max_f32
0.97 p_scharr3x3_f32
0.97 p_sobel3x3_f32
0.99 p_prewitt3x3_f32
1.00 p_cbrt_f32
1.00 p_invcbrt_f32
1.00 p_atan2_f32
1.00 p_ln_f32
1.00 p_atanh_f32
1.00 p_popcount_u64
1.00 p_rand
1.00 p_ftoi
1.00 p_div_f32
1.00 p_invsqrt_f32
1.00 p_sqrt_f32
1.00 p_popcount_u32
1.00 p_itof
1.00 p_box3x3_f32
1.00 p_laplace3x3_f32
1.00 p_gauss3x3_f32
1.00 p_log10_f32
1.00 p_acosh_f32
1.08 p_asinh_f32
1.23 p_cos_f32
1.24 p_sincos_f32
1.26 p_sin_f32
1.29 p_acos_f32
1.30 p_asin_f32
1.41 p_abs_f32
1.42 p_inv_f32
1.51 p_mac_f32
1.55 p_mean_f32
1.56 p_sumsq_f32
1.56 p_sum_f32
1.66 p_atan_f32
1.77 p_dot_f32
1.85 p_add_f32
1.85 p_mul_f32
1.85 p_sub_f32
1.86 p_absdiff_f32
1.93 p_sort_f32
2.01 p_median_f32
2.18 p_median3x3_f32
2.79 p_min_f32
---------------------
1.16 (Geometric mean)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment