Skip to content

Instantly share code, notes, and snippets.

@TomLous TomLous/GroupedGraph.scala
Last active Apr 25, 2017

Embed
What would you like to do?
import spark.sqlContext.implicits._
case class GroupedKvKRecord(groupId: Long, kvkRecords: Seq[KvKRecord])
// need to broadcast for rdd in rdd mapping
val bcVertices = spark.sparkContext.broadcast(vertices.collectAsMap)
// connectedComponents is the magic here
val groupedKvKRecords = linkedGraph
.connectedComponents
.vertices
.map {
case (id, smallestId) => (smallestId, id)
}
.groupByKey
.mapValues(
_.map(bcVertices.value(_)).toList // calling vertices directly in other rdd mapping will result in nullpointerexception
)
.toDF("groupId", "kvkRecords")
.as[GroupedKvKRecord]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.