For my many-body simulation I was looking for a way to implement user defined force calculations, which doesn't break the auto vectorization of the compiler. The user should be able to pass a lambda function that takes in two bodies and returns the pairwise interaction force between them, e.g. due to gravity or electromagnetic repulsion.
Roughly speaking, the code has this structure:
for(int i=0; i<particles.size(); i++){
double acceleration = 0.0;
for(int j=0; j<particles.size(); j++){
acceleration += pairwise_force(particles[i], particles[j]);
}
}
Turns out that passing the pairwise_force
using a std::function<double(Particle, Particle)>
is a really bad idea, because it turns this into an entirely scalar loop, even if the same code was vectorizing perfectly fine when inlined. Since this was the hot-loop in my simulation, the resulting performance drop of 8-16x is completely unacceptable.
Turns out there is a really nice way to fix this issue! By making the argument pairwise_force
of a template type instead of std::function<double(Particle, Particle)>
, we force the compiler to inline the function. The result is a more readable, flexible implementation, where the user can pass their own force implementation, but at the performance level of a hardcoded routine.
- During my investigation I came across this great benchmark comparing different ways of passing function, that is applied to a vector of elements.
- I had already given up on this topic, but looked into it again after a discussion with Fabio Gratl, the mastermind behind the incredible AutoPas library. Thank you so much for the tip of using templates!