Skip to content

Instantly share code, notes, and snippets.

@missing-user
Last active May 3, 2024 22:53
Show Gist options
  • Save missing-user/3114bacb98bc035156ec362c6b73251c to your computer and use it in GitHub Desktop.
Save missing-user/3114bacb98bc035156ec362c6b73251c to your computer and use it in GitHub Desktop.
benchmark_

Auto-Vectorized function application

For my many-body simulation I was looking for a way to implement user defined force calculations, which doesn't break the auto vectorization of the compiler. The user should be able to pass a lambda function that takes in two bodies and returns the pairwise interaction force between them, e.g. due to gravity or electromagnetic repulsion.

Roughly speaking, the code has this structure:

for(int i=0; i<particles.size(); i++){
  double acceleration = 0.0;
  for(int j=0; j<particles.size(); j++){
    acceleration += pairwise_force(particles[i], particles[j]);
  }
}

Turns out that passing the pairwise_force using a std::function<double(Particle, Particle)> is a really bad idea, because it turns this into an entirely scalar loop, even if the same code was vectorizing perfectly fine when inlined. Since this was the hot-loop in my simulation, the resulting performance drop of 8-16x is completely unacceptable.

Quick Bench Link

Turns out there is a really nice way to fix this issue! By making the argument pairwise_force of a template type instead of std::function<double(Particle, Particle)>, we force the compiler to inline the function. The result is a more readable, flexible implementation, where the user can pass their own force implementation, but at the performance level of a hardcoded routine.

References

  • During my investigation I came across this great benchmark comparing different ways of passing function, that is applied to a vector of elements.
  • I had already given up on this topic, but looked into it again after a discussion with Fabio Gratl, the mastermind behind the incredible AutoPas library. Thank you so much for the tip of using templates!
// Check this out on Compiler Explorer https://godbolt.org/z/afGo1o1f1
#include <vector>
#include <functional>
std::vector<double> apply_pairwise_interaction(const std::vector<double>& vec1, std::function<double(double, double)> func) {
std::vector<double> res(vec1.size());
for (size_t i = 0; i < vec1.size(); ++i) {
for (size_t j = 0; j < vec1.size(); ++j) {
res[j] = func(vec1[i], vec1[j]);
}
}
return res;
}
int main() {
std::vector<double> vec1(1000);
auto pairwise_interaction = [](double a, double b) {
return a * b;
};
auto res = apply_pairwise_interaction(vec1, pairwise_interaction);
return res.back();
}
// Quick Bench https://quick-bench.com/q/06TMup5sU8Q81zAOkE6ge-sbupY
#include <vector>
#include <algorithm>
#include <functional>
std::vector<double> vec1(256);
auto pairwise_interaction = [](double a, double b) {
return a * b; // Example interaction: multiplication
};
std::vector<double> apply_pairwise_interaction_f(const std::vector<double>& vec1, std::function<double(double, double)> func) {
std::vector<double> res(vec1.size());
for (size_t i = 0; i < vec1.size(); ++i) {
for (size_t j = 0; j < vec1.size(); ++j) {
res[j] = func(vec1[i], vec1[j]);
}
}
return res;
}
static void std_function(benchmark::State& state) {
for (auto _ : state) {
auto res = apply_pairwise_interaction_f(vec1, pairwise_interaction);
benchmark::DoNotOptimize(res);
}
}
BENCHMARK(std_function);
template<typename Func>
std::vector<double> apply_pairwise_interaction(const std::vector<double>& vec1, Func func) {
std::vector<double> res(vec1.size());
for (size_t i = 0; i < vec1.size(); ++i) {
for (size_t j = 0; j < vec1.size(); ++j) {
res[j] = func(vec1[i], vec1[j]);
}
}
return res;
}
static void function_as_template(benchmark::State& state) {
for (auto _ : state) {
auto res = apply_pairwise_interaction(vec1, pairwise_interaction);
benchmark::DoNotOptimize(res);
}
}
BENCHMARK(function_as_template);
std::vector<double> apply_pairwise_interaction_transform(const std::vector<double>& vec1, std::function<double(double, double)> func) {
std::vector<double> res(vec1.size());
for (size_t i = 0; i < vec1.size(); ++i) {
std::transform(vec1.begin(), vec1.end(), res.begin(), [&](double val) {
return func(vec1[i], val);
});
}
return res;
}
static void std_transform(benchmark::State& state) {
for (auto _ : state) {
std::vector<double> res(vec1.size());
for (size_t i = 0; i < vec1.size(); ++i) {
std::transform(vec1.begin(), vec1.end(), res.begin(), [&](double val) {
return vec1[i] * val;
});
}
benchmark::DoNotOptimize(res);
}
}
BENCHMARK(std_transform);
static void hardcoded(benchmark::State& state) {
for (auto _ : state) {
std::vector<double> res(vec1.size());
for (size_t i = 0; i < vec1.size(); ++i) {
for (size_t j = 0; j < vec1.size(); ++j) {
res[j] = vec1[i] * vec1[j];
}
}
benchmark::DoNotOptimize(res);
}
}
BENCHMARK(hardcoded);
// Check this out on Compiler Explorer https://godbolt.org/z/zexbh1vTs
#include <vector>
template<typename Func>
std::vector<double> apply_pairwise_interaction(const std::vector<double>& vec1, Func func) {
std::vector<double> res(vec1.size());
for (size_t i = 0; i < vec1.size(); ++i) {
for (size_t j = 0; j < vec1.size(); ++j) {
res[j] = func(vec1[i], vec1[j]);
}
}
return res;
}
int main() {
std::vector<double> vec1(1000);
auto pairwise_interaction = [](double a, double b) {
return a * b;
};
auto res = apply_pairwise_interaction(vec1, pairwise_interaction);
return res.back();
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment