Skip to content

Instantly share code, notes, and snippets.

@sambos
Last active April 21, 2018 15:57
Show Gist options
  • Save sambos/64d217fd16b77183738f2a9167b2da17 to your computer and use it in GitHub Desktop.
Save sambos/64d217fd16b77183738f2a9167b2da17 to your computer and use it in GitHub Desktop.
reduceByKey aggregateByKey combineByKey

aggregateByKey

A general form of reduceByKey, that can be used when the return value is different type than input. Takes an initial value of accumulator.

     val initialList = scala.collection.mutable.ListBuffer[Row]()
     val addToList = (acc:scala.collection.mutable.ListBuffer[Row], x:Row) => x +: acc
     val mergePartitionLists = (acc1: scala.collection.mutable.ListBuffer[Row],acc2: scala.collection.mutable.ListBuffer[Row]) => acc1 ++ acc2
    
     val gbyKey = rdd.map(x => (x.getAs[String]("xtransId"), x)).aggregateByKey(initialList)(addToList,         mergePartitionLists).map(x => (x._1, x._2.toList))
                          
     val initialSet = scala.collection.mutable.HashSet.empty[Row]
     val addToSet = (acc: scala.collection.mutable.HashSet[Row], v: Row) => acc += v
     val mergePartitionsSets = (acc1: scala.collection.mutable.HashSet[Row], acc2: scala.collection.mutable.HashSet[Row]) => acc1 ++= acc2
    val gbyKey = rdd.map(x => (x._1, x._2)).aggregateByKey(initialSet)(addToSet, mergePartitionSets).map(x => (x._1, x._2.toList))
    

combineByKey

A more general form of reduceByKey that provides a function for creating an initial accumulator, can be used when the return value is different type than input. Example below shows immutable List but its recommended to use scala.collection.mutable collection

val gbyKey =  rdd.map(x => (x._1, x._2))
                  .combineByKey(
                          (x: Row) => List(x),              
                          (acc: List[Row], x) => x :: acc,  
                          (acc1: List[Row], acc2: List[Row]) => acc1 ::: acc2)
                          
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment