Skip to content

Instantly share code, notes, and snippets.

@kassane
Last active May 16, 2024 18:57
Show Gist options
  • Save kassane/aee0f7e6dd6b9d1d56b0fe7057ba99d4 to your computer and use it in GitHub Desktop.
Save kassane/aee0f7e6dd6b9d1d56b0fe7057ba99d4 to your computer and use it in GitHub Desktop.

Reference

updated: https://godbolt.org/z/fqT37xGPM - add Dlang and fix c++ initialize_list

Not installed .NET

# D betterC 
Benchmark 1 (121 runs): ./structmem_rs
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           673us ± 61.8us     549us … 1.13ms         11 ( 9%)        0%
  peak_rss           1.98MB ± 72.4KB    1.84MB … 2.10MB         39 (32%)        0%
  cpu_cycles          297K  ± 25.8K      279K  …  487K          12 (10%)        0%
  instructions        325K  ±  289       325K  …  326K           0 ( 0%)        0%
  cache_references   18.4K  ±  402      17.9K  … 22.0K           6 ( 5%)        0%
  cache_misses       6.89K  ±  104      6.59K  … 7.20K           4 ( 3%)        0%
  branch_misses      4.28K  ± 29.9      4.20K  … 4.36K           3 ( 2%)        0%
Benchmark 2 (168 runs): ./structmem_d
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           565us ± 46.9us     453us …  717us          7 ( 4%)        ⚡- 16.1% ±  1.9%
  peak_rss           1.79MB ± 63.4KB    1.70MB … 1.84MB          0 ( 0%)        ⚡- 10.0% ±  0.8%
  cpu_cycles          217K  ± 14.1K      205K  …  308K          28 (17%)        ⚡- 26.8% ±  1.6%
  instructions        183K  ± 41.0       183K  …  183K           0 ( 0%)        ⚡- 43.7% ±  0.0%
  cache_references   14.5K  ±  282      13.7K  … 16.5K           8 ( 5%)        ⚡- 21.1% ±  0.4%
  cache_misses       5.46K  ± 95.9      5.25K  … 5.87K           7 ( 4%)        ⚡- 20.7% ±  0.3%
  branch_misses      3.29K  ± 23.6      3.18K  … 3.36K           4 ( 2%)        ⚡- 23.1% ±  0.1%
Benchmark 3 (178 runs): ./structmem_cpp
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           533us ± 44.5us     404us …  628us          9 ( 5%)        ⚡- 20.9% ±  1.8%
  peak_rss           1.71MB ± 89.6KB    1.57MB … 1.84MB          0 ( 0%)        ⚡- 13.9% ±  1.0%
  cpu_cycles          219K  ± 12.3K      210K  …  285K          21 (12%)        ⚡- 26.0% ±  1.5%
  instructions        205K  ± 20.3       205K  …  205K           0 ( 0%)        ⚡- 36.9% ±  0.0%
  cache_references   13.8K  ±  217      13.0K  … 15.0K           4 ( 2%)        ⚡- 24.7% ±  0.4%
  cache_misses       5.70K  ± 95.5      5.33K  … 6.03K           6 ( 3%)        ⚡- 17.3% ±  0.3%
  branch_misses      3.33K  ± 42.1      3.12K  … 3.41K          16 ( 9%)        ⚡- 22.2% ±  0.2%

# NO D BetterC (nogc only)
Benchmark 1 (96 runs): ./structmem_rs
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           680us ± 70.5us     540us … 1.01ms         15 (16%)        0%
  peak_rss           1.98MB ± 71.8KB    1.84MB … 2.10MB         31 (32%)        0%
  cpu_cycles          301K  ± 33.5K      280K  …  524K           7 ( 7%)        0%
  instructions        325K  ±  288       325K  …  326K           0 ( 0%)        0%
  cache_references   18.4K  ±  205      18.0K  … 19.4K           2 ( 2%)        0%
  cache_misses       6.92K  ±  108      6.69K  … 7.30K           4 ( 4%)        0%
  branch_misses      4.28K  ± 27.3      4.22K  … 4.34K           0 ( 0%)        0%
Benchmark 2 (134 runs): ./structmem_d
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           716us ± 66.0us     571us … 1.12ms         19 (14%)        💩+  5.4% ±  2.6%
  peak_rss           2.41MB ± 88.1KB    2.23MB … 2.49MB          0 ( 0%)        💩+ 21.6% ±  1.1%
  cpu_cycles          313K  ± 20.3K      299K  …  432K          12 ( 9%)        💩+  4.3% ±  2.3%
  instructions        313K  ± 63.4       313K  …  313K           0 ( 0%)        ⚡-  3.7% ±  0.0%
  cache_references   23.7K  ±  230      23.2K  … 24.3K           0 ( 0%)        💩+ 29.1% ±  0.3%
  cache_misses       7.35K  ±  131      7.08K  … 7.73K           1 ( 1%)        💩+  6.3% ±  0.5%
  branch_misses      4.27K  ± 23.8      4.23K  … 4.34K           1 ( 1%)          -  0.1% ±  0.2%
Benchmark 3 (178 runs): ./structmem_cpp
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           531us ± 44.0us     420us …  681us         12 ( 7%)        ⚡- 21.8% ±  2.0%
  peak_rss           1.72MB ± 89.9KB    1.57MB … 1.84MB          0 ( 0%)        ⚡- 13.2% ±  1.1%
  cpu_cycles          219K  ± 12.2K      208K  …  293K          29 (16%)        ⚡- 27.2% ±  1.8%
  instructions        205K  ± 19.4       205K  …  205K           0 ( 0%)        ⚡- 36.9% ±  0.0%
  cache_references   13.8K  ±  171      13.3K  … 14.4K           5 ( 3%)        ⚡- 24.9% ±  0.2%
  cache_misses       5.70K  ±  107      5.32K  … 6.13K           7 ( 4%)        ⚡- 17.6% ±  0.4%
  branch_misses      3.33K  ± 41.2      3.15K  … 3.39K          13 ( 7%)        ⚡- 22.0% ±  0.2%
@kassane
Copy link
Author

kassane commented May 16, 2024

SafeRefCounted uses malloc: https://dlang.org/library/std/typecons/safe_ref_counted.html (ldc2 get better performance instead dmd or gdc 14.x)
What software was used for the bench? poop

cc: @tjpalmer

@tjpalmer
Copy link

Thanks! What optimizations did you use, by the way? It seemed at a glance with with heavy optimizations and allowing inlining that much of the work (including counting) was avoided in some cases. I was just avoiding that for purposes of looking at core semantics. I'm not sure how much can be easily optimized in realistic, complicated software.

@tjpalmer
Copy link

And was your C++ fix just using parens instead of curlies? I didn't diff so far. Just looked at it.

@kassane
Copy link
Author

kassane commented May 16, 2024

What optimizations did you use, by the way?

# similar to -O2 or -O3
ldmd2 -O -release structmem.d -of=structmem_d # optional add -betterC forcing no DruntimeGC
clang++ -O3 strucmem.cpp -o structmem_cpp
rust - cargo build --release

And was your C++ fix just using parens instead of curlies? I didn't diff so far. Just looked at it.

clang get error on:

$ clang++ -O3 structmem.cpp -o structmem_cpp -std=c++23
structmem.cpp:34:13: error: no viable constructor or deduction guide for deduction of template arguments of 'initializer_list'
   34 |             std::initializer_list{6, 7}
      |             ^
/usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/14.1.1/../../../../include/c++/14.1.1/initializer_list:60:17: note: candidate template ignored: could not match 'const_iterator' (aka 'const _E *') against 'int'
   60 |       constexpr initializer_list(const_iterator __a, size_type __l)
      |                 ^
/usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/14.1.1/../../../../include/c++/14.1.1/initializer_list:45:11: note: candidate function template not viable: requires 1 argument, but 2 were provided
   45 |     class initializer_list
      |           ^~~~~~~~~~~~~~~~
/usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/14.1.1/../../../../include/c++/14.1.1/initializer_list:64:17: note: candidate function template not viable: requires 0 arguments, but 2 were provided
   64 |       constexpr initializer_list() noexcept
      |                 ^
1 error generated.

@tjpalmer
Copy link

Thanks much for the info and for the analysis! Out of curiosity, does ldc or clang do LTO with those settings, or would it matter?

@kassane
Copy link
Author

kassane commented May 16, 2024

does ldc or clang do LTO with those settings, or would it matter?

ldc2 have -flto=full or -flto=thin , similar to clang.
http://johanengelen.github.io/ldc/2016/11/10/Link-Time-Optimization-LDC.html

@kassane
Copy link
Author

kassane commented May 16, 2024

Note: When using clang you can also experiment with highlighting C++ABI. Results may differ between libstdc++ (gnu abi) and libc++ (llvm abi). -stdlib=libname

@tjpalmer
Copy link

Thanks again!

@kassane
Copy link
Author

kassane commented May 16, 2024

@tjpalmer
Copy link

Mind if I link this gist from the video?

@kassane
Copy link
Author

kassane commented May 16, 2024

Mind if I link this gist from the video?

Feel free to share it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment