The idea es to evaluate if it is worth it to have parallel versions of some methods in order to speed up processing or obtain better readable code than trying to parallelize outside the library.
The first step in the analysis is to perform a basic benchmark test to see if some gain is achieved with rudimentary algorithms. For that matter a modified version of ENTT library is used, which posses parallel version of most of the methods. I modified ENTT library in a very straight forward and awful way, I just peek here and there an replaced the algorithms that GCC already implemented in parallel and renamed that method to same name with _par suffix. Then I replaced in other methods any calls I found to those 'double version methods' and I made those other also _par suffixed. Of course this is no way to parallelize code, thought must be put in order to decide where to parallelize, but it is a start. (btw: sorry to doing that to your code skypjack)
So here are the components for the tests:
- ENTT Parallel/Sequencial version
- Basic loop capable of performing FPS computation
- Test compoment structs testcomp and testcomp1 both holding just an integer
- Registry with 200000 Entities, each of the entities with a testcomp component and odd entities also with testcomp1
- Tests: All the test are performed whenever is possible (not all tests supports all modes)in four modes:
- NORMAL: mode, using ENTT each/assign/view,etc.
- STD: Standard C++ mode, using std::for_each
- EPAR: ENTT parallel mode, using modified parallel methods
- SPAR: Usign GCC parallel for_each
note: All tests using views have the overhead of creating a view en sequencial mode, only view creation test in parallel mode creates a view with the modified view_par.
BasicLoop: Just to compute an FPS base for reference
SingleViewCreation: Create a single component view
DoubleViewCreation: Create a double component view
AssignOrReplace: Iterates a single component view and assigns a new component to the entity
SingleViewIterate: Iterate a single component view just incrementing a static integer
DoubleViewIterate: Iterate a double component view just incrementing a static integer
SingleViewIterateOp: Iterate a single component view and increment an integer inside the component
DoubleViewIterateOp: Iterate a double component view and increment an integer inside each component
SingleViewRandomize: Iterate a single component view and assigns a random number to an integer inside the component
DoubleViewRandomize: Iterate a single component view and assigns a random number to an integer inside each component
FirstComponentSort: Sorts the first component
SecondComponentSort: Sorts the second component (same as above, but half the entities)
BasicLoop: 14.87ns
TEST | NORMAL | STD | EPAR | SPAR |
---|---|---|---|---|
SingleViewCreation | 2.71ns | 3.51ns | ||
DoubleViewCreation | 6.41ns | 6.03ns | ||
AssignOrReplace | 976.56us | 987.95us | ||
SingleViewIterate | 2.77ns | 2.92ns | 78.85us | 44.03us |
DoubleViewIterate | 1051.22us | 1110.44us | 336.14us | 1111.41us |
SingleViewIterateOp | 71.32us | 1004.38us | 127.53us | 119.76us |
DoubleViewIterateOp | 1138.52us | 6938.68us | 2089.75us | 10552.3us |
SingleViewRandomize | 2778.36us | 3642.15us | 3164.43us | 1937.31us |
DoubleViewRandomize | 6529.5us | 12167.5us | 5943.93us | 15919.6us |
FirstComponentSort | 19081.9us | 8311.99us | ||
SecondComponentSort | 16520.8us | 5825.68us |
During the tests some cases where found when it is impossible to perform the task in parallel, not because a degradation in performance but because the system can not handle it. For example if the view is iterated in parallel for opengl drawing, making calls (even using a mutex) are not allowed from outside the context thread. Also sometimes parallelizing can bring down the performance. So parallelizing everything is not an option. I tried that using -D_GLIBCXX_PARALLEL in gcc and it is just not useful.
I think parallel versions of ENTT method are needed because of two motives. One, because some methods like sort are (I didn't find any other way to do it) not allowed from the outside and the gain is very important. Second because the code will be much more portable and readable. Just a compile time option and you can enable or disable parallelizing.
This is the modified parallel version of ENTT used in this test https://github.com/pyaggi/entt_par (it is an ugly brute force hack, only for testing pourposes)