- Graph is parititoned for CPU and GPU.
- Louvain is independently performed to get psuedo-communities.
- Determine doubtful vertices which do not belong to communities formed on a device.
- Doubtful vertices are exchanged between the devices.
- Executes Louvain algorithm again from subgraph on devices, which includes communities formed earlier and doubtful vertices.
- This results in new communities and a new set of doubtful vertices.
- Doubtful vertices are exchanged again.
- Graph is coarsened to form a reduced graph of new vertices.
- Continues until number of communities is small to accomodate on a single device.
- Once accomodated, run it on a single device.
Partitioning is done using METIS.
- Hybrid algorithm gives
~2x
speedup over CPU-only, with less than1%
change in modularity. - At least
42
-73%
lower runtime than state-of-the-art multicore algorithm (Grappolo). - Are doubtful vertices overlapping? Most likely no. In any case, migration time is quite low. The time taken for identifying doubtful vertices is
5
-25%
. - Partitioning is based on proportional performance of the two devices, using ratio-based METIS (using induced subgraphs). The partitioning time is around
10
to50%
. - On CPU, Lu et al.'s algorithm is used (Grappolo), and on the GPU, Naim et al.'s algorithm is used. CPU used
std::map
and GPU usedthrust
, but details of hashtables are not provided (likely same as in the original paper by Naim et al.) - The size of graphs used (max) is
2.27B
edges.