A general form of reduceByKey, that can be used when the return value is different type than input. Takes an initial value of accumulator.
val initialList = scala.collection.mutable.ListBuffer[Row]()
val addToList = (acc:scala.collection.mutable.ListBuffer[Row], x:Row) => x +: acc
val mergePartitionLists = (acc1: scala.collection.mutable.ListBuffer[Row],acc2: scala.collection.mutable.ListBuffer[Row]) => acc1 ++ acc2
val gbyKey = rdd.map(x => (x.getAs[String]("xtransId"), x)).aggregateByKey(initialList)(addToList, mergePartitionLists).map(x => (x._1, x._2.toList))
val initialSet = scala.collection.mutable.HashSet.empty[Row]
val addToSet = (acc: scala.collection.mutable.HashSet[Row], v: Row) => acc += v
val mergePartitionsSets = (acc1: scala.collection.mutable.HashSet[Row], acc2: scala.collection.mutable.HashSet[Row]) => acc1 ++= acc2
val gbyKey = rdd.map(x => (x._1, x._2)).aggregateByKey(initialSet)(addToSet, mergePartitionSets).map(x => (x._1, x._2.toList))
A more general form of reduceByKey that provides a function for creating an initial accumulator, can be used when the return value is different type than input. Example below shows immutable List but its recommended to use scala.collection.mutable collection
val gbyKey = rdd.map(x => (x._1, x._2))
.combineByKey(
(x: Row) => List(x),
(acc: List[Row], x) => x :: acc,
(acc1: List[Row], acc2: List[Row]) => acc1 ::: acc2)