Lu, H., Halappanavar, M., & Kalyanaraman, A. (2015). Parallel heuristics for scalable community detection. Parallel Computing, 47, 19-37.
In the paper, Lu et al. discuss why parallel community detection may cause negative modularity gain - this happens in cases when two disconnected nodes join the same community with little affinity to the community they are moving to. It is also possible that the modularity gain estimated by a parallel algorithm is lower that the actual gain - this can happen when two nodes are connected, and move to the same community.
Lu et al. also introduce the singlet minimum label heuristic to prevent vertex swapping. They also propose the generalized minimum label heuristic, where a vertex is moved to the community with the highest modularity gain, but with the smallest label (if multiple communities have the same gain). They also discuss distance-1 coloring as a way to prevent vertex swapping, where only vertices with the same color are processed in parallel. In the vertex following heuristic, they discuss collapsing vertices with only one neighbor as a preprocessing step.
Refer to the PDF for more details on the heuristics and their implementation.