Skip to content

Instantly share code, notes, and snippets.

@ankurdave
Last active August 27, 2016 05:19
Show Gist options
  • Save ankurdave/587eac4d08655d0eebf9 to your computer and use it in GitHub Desktop.
Save ankurdave/587eac4d08655d0eebf9 to your computer and use it in GitHub Desktop.
import org.apache.spark._
import org.apache.spark.graphx._
val triplets = sc.textFile(path).flatMap { line =>
if (!line.isEmpty && line(0) != '#') {
val lineArray = line.split("\\s+")
if (lineArray.length < 2) {
None
} else {
val t = new EdgeTriplet[String, String]
t.srcId = lineArray(0).hashCode
t.srcAttr = lineArray(0)
t.attr = lineArray(1)
t.dstId = lineArray(2).hashCode
t.dstAttr = lineArray(2)
Some(t)
}
} else {
None
}
}
val vertices = triplets.flatMap(t => Array((t.srcId, t.srcAttr), (t.dstId, t.dstAttr)))
val edges = triplets.map(t => t: Edge[String])
Graph(vertices, edges)
import org.apache.spark._
import org.apache.spark.graphx._
// Load one edge per line in whitespace-delimited format: srcVertexId edgeAttr dstVertexId
val edges = sc.textFile(path).flatMap { line =>
if (!line.isEmpty && line(0) != '#') {
val lineArray = line.split("\\s+")
if (lineArray.length < 2) {
None
} else {
val srcId = lineArray(0).toLong
val attr = // parse lineArray(1) as appropriate
val dstId = lineArray(2).toLong
Some(Edge(srcId, dstId, attr))
}
} else {
None
}
}
Graph.fromEdges(edges, 1)
@soumitraj
Copy link

Hi Ankur,
The input that I have is n1 P1 n2 where n1 and n2 are strings and not numbers(int or long). How can I make a graph in this case?

Thanks
Soumitra

@DavidGruzman
Copy link

Would hash collisions cause wrong graph construction?

@ajaybgupta
Copy link

I am also worried about hash collision part

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment