Skip to content

Instantly share code, notes, and snippets.

View srirambaskaran's full-sized avatar

Sriram Baskaran srirambaskaran

  • Insight Data Science
  • San Francisco
View GitHub Profile
def bfs(graph: RDD[(Int, List[Int])], source: Int): Unit = {
// Boolean constants
val VISITED: boolean = true
val NOT_VISITED: boolean = false
val IN_QUEUE: boolean = true
val NOT_IN_QUEUE: boolean = false
// Creating the traversal object
var traversal = graph.map{
(vertex_id, adj_list) =>
@srirambaskaran
srirambaskaran / distributed-shortest-path-note.md
Last active July 12, 2021 11:31
A note on implementing community detection using Apache Spark + GraphX

A note on implementing community detection using Apache Spark + GraphX

Girvan Newman Algorithm

This is one of the earliest methods of community detection. This method is simple to understand and can be easily distributed across clusters for faster processing. Key assumption is that the graph is undirected and unweighted. But it is not hard to extend to directed graphs + weighted edges.

The algorithm is fairly straightforward. It defines a new measure called edge betweenness centrality based on which a divisive hierarchical clustering algorithm is designed to find communities. The stopping criteria for this uses a popular metric called modularity which quantifies how cohesive the communities are during the clustering process.

Side note: A bit of search reveled no implementation of this algorithm in a distributed way (mainly because its slow and better algorithms are available?). So, this note would pave way to use this naive algorithm inspite of its high time complexity.