Skip to content

Instantly share code, notes, and snippets.

@ankurdave
Created September 16, 2014 09:29
Show Gist options
  • Save ankurdave/25732a493bc8c8541c97 to your computer and use it in GitHub Desktop.
Save ankurdave/25732a493bc8c8541c97 to your computer and use it in GitHub Desktop.
// Finds the connected component containing a particular vertex.
// In response to http://apache-spark-developers-list.1001551.n3.nabble.com/GraphX-some-vertex-with-specific-edge-td8436.html
import org.apache.spark.graphx._
// Construct the graph in the above example
val edges = sc.parallelize(List(
Edge(1L, 2L, "e1"), Edge(2L, 3L, "e1"), Edge(3L, 4L, "e1")))
val g: Graph[Int, String] = Graph.fromEdges(edges, 0)
val sourceVertexId: VertexId = 1L // vertex a in the example
val edgeProperty: String = "e1"
// Filter the graph to contain only edges matching the edgeProperty
val filteredG = g.subgraph(epred = e => e.attr == edgeProperty)
// Find the connected components of the subgraph, and cache it because we
// use it more than once below
val components: VertexRDD[VertexId] =
filteredG.connectedComponents().vertices.cache()
// Get the component id of the source vertex
val sourceComponent: VertexId = components.filter {
case (id, component) => id == sourceVertexId
}.map(_._2).collect().head
// Print the vertices in that component
components.filter {
case (id, component) => component == sourceComponent
}.map(_._1).collect
// => Array(1, 2, 3, 4)
@gtzinos
Copy link

gtzinos commented May 13, 2018

I am wondering if you could scale this algorithm to use it to all vertex id's automatically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment