pyaggi/entt_parallel.md

## entt_parallel.md

      
    Raw
  

              entt_parallel.md
            
          
    Parallel processing with ENTT

The idea es to evaluate if it is worth it to have parallel versions of some methods in order to speed up processing or obtain better readable code than trying to parallelize outside the library.
Tests

The first step in the analysis is to perform a basic benchmark test to see if some gain is achieved with rudimentary algorithms.
For that matter a modified version of ENTT library is used, which posses parallel version of most of the methods. I modified ENTT library in a very straight forward and awful way, I just peek here and there an replaced the algorithms that GCC already implemented in parallel and renamed that method to same name with _par suffix. Then I replaced in other methods any calls I found to those 'double version methods' and I made those other also _par suffixed. Of course this is no way to parallelize code, thought must be put in order to decide where to parallelize, but it is a start.
(btw: sorry to doing that to your code skypjack)
So here are the components for the tests:

ENTT Parallel/Sequencial version
Basic loop capable of performing FPS computation
Test compoment structs testcomp and testcomp1 both holding just an integer
Registry with 200000 Entities, each of the entities with a testcomp component and odd entities also with testcomp1
Tests: All the test are performed whenever is possible (not all tests supports all modes)in four modes:

NORMAL: mode, using ENTT each/assign/view,etc.
STD: Standard C++ mode, using std::for_each
EPAR: ENTT parallel mode, using modified parallel methods
SPAR: Usign GCC parallel for_each


Tests List

note: All tests using views have the overhead of creating a view en sequencial mode, only view creation test in parallel mode creates a view with the modified view_par.
BasicLoop: Just to compute an FPS base for reference
SingleViewCreation: Create a single component view
DoubleViewCreation: Create a double component view
AssignOrReplace: Iterates a single component view and assigns a new component to the entity
SingleViewIterate: Iterate a single component view just incrementing a static integer
DoubleViewIterate: Iterate a double component view just incrementing a static integer
SingleViewIterateOp: Iterate a single component view and increment an integer inside the component
DoubleViewIterateOp: Iterate a double component view and increment an integer inside each component
SingleViewRandomize: Iterate a single component view and assigns a random number to an integer inside the component
DoubleViewRandomize: Iterate a single component view and assigns a random number to an integer inside each component
FirstComponentSort: Sorts the first component
SecondComponentSort: Sorts the second component (same as above, but half the entities)
Results

BasicLoop: 14.87ns


TEST
NORMAL
STD
EPAR
SPAR


SingleViewCreation
2.71ns

3.51ns


DoubleViewCreation
6.41ns

6.03ns


AssignOrReplace
976.56us

987.95us


SingleViewIterate
2.77ns
2.92ns
78.85us
44.03us


DoubleViewIterate
1051.22us
1110.44us
336.14us
1111.41us


SingleViewIterateOp
71.32us
1004.38us
127.53us
119.76us


DoubleViewIterateOp
1138.52us
6938.68us
2089.75us
10552.3us


SingleViewRandomize
2778.36us
3642.15us
3164.43us
1937.31us


DoubleViewRandomize
6529.5us
12167.5us
5943.93us
15919.6us


FirstComponentSort
19081.9us

8311.99us


SecondComponentSort
16520.8us

5825.68us


Conclusions

During the tests some cases where found when it is impossible to perform the task in parallel, not because a degradation in performance but because the system can not handle it. For example if the view is iterated in parallel for opengl drawing, making calls (even using a mutex) are not allowed from outside the context thread. Also sometimes parallelizing can bring down the performance. So parallelizing everything is not an option. I tried that using -D_GLIBCXX_PARALLEL in gcc and it is just not useful.
I think parallel versions of ENTT method are needed because of two motives. One, because some methods like sort are (I didn't find any other way to do it) not allowed from the outside and the gain is very important. Second because the code will be much more portable and readable. Just a compile time option and you can enable or disable parallelizing.
This is the modified parallel version of ENTT used in this test https://github.com/pyaggi/entt_par
(it is an ugly brute force hack, only for testing pourposes)
TEST	NORMAL	STD	EPAR	SPAR
SingleViewCreation	2.71ns		3.51ns
DoubleViewCreation	6.41ns		6.03ns
AssignOrReplace	976.56us		987.95us
SingleViewIterate	2.77ns	2.92ns	78.85us	44.03us
DoubleViewIterate	1051.22us	1110.44us	336.14us	1111.41us
SingleViewIterateOp	71.32us	1004.38us	127.53us	119.76us
DoubleViewIterateOp	1138.52us	6938.68us	2089.75us	10552.3us
SingleViewRandomize	2778.36us	3642.15us	3164.43us	1937.31us
DoubleViewRandomize	6529.5us	12167.5us	5943.93us	15919.6us
FirstComponentSort	19081.9us		8311.99us
SecondComponentSort	16520.8us		5825.68us