Skip to content

Instantly share code, notes, and snippets.

@TomLous TomLous/GroupedGraph.scala
Last active Apr 25, 2017

What would you like to do?
import spark.sqlContext.implicits._
case class GroupedKvKRecord(groupId: Long, kvkRecords: Seq[KvKRecord])
// need to broadcast for rdd in rdd mapping
val bcVertices = spark.sparkContext.broadcast(vertices.collectAsMap)
// connectedComponents is the magic here
val groupedKvKRecords = linkedGraph
.map {
case (id, smallestId) => (smallestId, id)
.mapValues( // calling vertices directly in other rdd mapping will result in nullpointerexception
.toDF("groupId", "kvkRecords")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.