Skip to content

Instantly share code, notes, and snippets.

Scalable Static and Dynamic Community Detection Using Grappolo : NOTES

A community (in a network) is a subset of nodes which are strongly connected among themselves, but weakly connected to others. Neither the number of output communities nor their size distribution is known a priori. Community detection methods can be divisive or agglomerative. Divisive methods use betweeness centrality to identify and remove bridges between communities. Agglomerative methods greedily merge two communities that provide maximum gain in modularity. Newman and Girvan have introduced the modularity metric. The problem of community detection is then reduced to the problem of modularity maximization which is NP-complete. Louvain method is a variant of the agglomerative strategy, in that is a multi-level heuristic.

In this paper, the authors discuss four heuristics for Community detection using the Louvain algorithm implemented upon recently developed Grappolo, which is a parallel variant of the Louvain algorithm. They are:

  • Vertex following and Minimum label
  • Data caching
  • Graph coloring
  • Threshold scaling

With the Vertex following heuristic, the input is preprocessed and all single-degree vertices are merged with their corresponding neighbours. This helps reduce the number of vertices considered in each iteration, and also help initial seeds of communities to be formed. With the Minimum label heuristic, when a vertex is making the decision to move to a community and multiple communities provided the same modularity gain, the community with the smallest id is chosen. This helps minimize or prevent community swaps. With the Data caching heuristic, community information is stored in a vector instead of a map, and is reused in each iteration, but with some additional cost. With the Vertex ordering via Graph coloring heuristic, distance-k coloring of graphs is performed in order to group vertices into colors. Then, each set of vertices (by color) is processed concurrently, and synchronization is performed after that. This enables us to mimic the behaviour of the serial algorithm. Finally, with the Threshold scaling heuristic, successively smaller values of modularity threshold are used as the algorithm progresses. This allows the algorithm to converge faster, and it has been observed a good modularity score as well.

From the results, it appears that graph coloring and threshold scaling heuristics do not always provide a speedup and this depends upon the nature of the graph. It would be interesting to compare the heuristics against baseline approaches. Future work can include distributed memory implementations, and community detection on streaming graphs.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment