Skip to content

Instantly share code, notes, and snippets.

@bitonic
Last active November 25, 2021 11:33
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bitonic/2d09df858ba2233b7f472f5f8c0512b4 to your computer and use it in GitHub Desktop.
Save bitonic/2d09df858ba2233b7f472f5f8c0512b4 to your computer and use it in GitHub Desktop.
/*
Benchmark for Eigen change, see <https://gitlab.com/libeigen/eigen/-/merge_requests/734#note_743674873>
Comment reported here for posterity:
Here's a synthetic "benchmark" which I _believe_ shows the difference: https://gist.github.com/bitonic/2d09df858ba2233b7f472f5f8c0512b4 .
I say that I believe that it exhibits the difference because it shows the runtime differences that I'd expect, with some caveats (see comments on number of instructions below).
However, I have not inspected the assembly manually to check that the code varies in the way I'd expect, which would be a requirement to ensure that things change in the way we expect. That is a bit more labor intensive, and while I might do it, I don't have time to do it right now.
The code inline:
```cpp
#include <Eigen/Core>
#include <iostream>
using ArrType = Eigen::Array<float, 2, 169>;
__attribute__((noinline))
static void print_array(const char* name, const ArrType& arr) {
std::cout << name << ": " << arr << std::endl;
}
__attribute__((noinline))
static void test_packet(float x) {
ArrType xs(x);
ArrType ys(0.0f);
print_array("xs", xs);
print_array("ys", ys);
for (size_t i = 0; i < 100000000; i++) {
if (i % 2 == 0) {
ys += xs;
} else {
ys -= xs;
}
}
print_array("ys", ys);
return;
}
int main() {
test_packet(5.0f);
return 0;
}
```
I compile it with
```
% clang++ -std=c++20 -I. -Wall -Werror -mavx2 -O3 test-avx2.cpp -o test-avx2
```
In the `eigen` repo.
We just add and subtract from an array which is 169 elements wide. What I realized is that this change only affects arrays of static size -- which was the case in the proprietary code this perf improvement came up in. In fact I am using the same size I were using in that code -- 169. We might want to extend it to `Dynamic` (see the stop condition on the same line on why it does not work with `Dynamic`.
If we compile with the improvement, this is perf stat:
```
Performance counter stats for './test-avx2-new':
2,526.48 msec task-clock # 1.000 CPUs utilized
5 context-switches # 0.002 K/sec
2 cpu-migrations # 0.001 K/sec
89 page-faults # 0.035 K/sec
9,079,877,582 cycles # 3.594 GHz
12,968,042,398 instructions # 1.43 insn per cycle
203,464,968 branches # 80.533 M/sec
91,320 branch-misses # 0.04% of all branches
2.527443809 seconds time elapsed
2.525190000 seconds user
0.001999000 seconds sys
```
Numbers of note: 2.5 seconds runtime, 12B instructions. With the old code:
```
3,704.16 msec task-clock # 0.999 CPUs utilized
8 context-switches # 0.002 K/sec
0 cpu-migrations # 0.000 K/sec
86 page-faults # 0.023 K/sec
13,027,668,290 cycles # 3.517 GHz
38,871,811,483 instructions # 2.98 insn per cycle
2,904,199,848 branches # 784.037 M/sec
139,444 branch-misses # 0.00% of all branches
3.706167382 seconds time elapsed
3.702763000 seconds user
0.002999000 seconds sys
```
3.7 seconds runtime (1.5x speedup), 40B instructions. I actually do not have a great explanation for the 3x jump in instruction, I was expecting a 2x jump, roughly.
Again, I've learnt to not make definitive statements when it comes to micro benchmarks unless I have checked the assembly, but I think the above already gives some confidence that the code does what I think it does.
*/
#include <Eigen/Core>
#include <iostream>
using ArrType = Eigen::Array<float, 2, 169>;
__attribute__((noinline))
static void print_array(const char* name, const ArrType& arr) {
std::cout << name << ": " << arr << std::endl;
}
__attribute__((noinline))
static void test_packet(float x) {
ArrType xs(x);
ArrType ys(0.0f);
print_array("xs", xs);
print_array("ys", ys);
for (size_t i = 0; i < 100000000; i++) {
if (i % 2 == 0) {
ys += xs;
} else {
ys -= xs;
}
}
print_array("ys", ys);
return;
}
int main() {
test_packet(5.0f);
return 0;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment