Skip to content

Instantly share code, notes, and snippets.

@ytjia
Created October 30, 2014 16:39
Show Gist options
  • Save ytjia/2e42e3ccc0367c39afa7 to your computer and use it in GitHub Desktop.
Save ytjia/2e42e3ccc0367c39afa7 to your computer and use it in GitHub Desktop.
Calculate average value in spark.
var data = sc.parallelize(Seq(("A", 2), ("A", 4), ("B", 2), ("Z", 0), ("B", 10)))
// data: org.apache.spark.rdd.RDD[(java.lang.String, Int)] = ParallelCollectionRDD[31] at parallelize at <console>:12
val avgValue = data.mapValues((_, 1)
.reduceByKey((x, y) => (x._1 + y._1, x._2 + y._2))
.mapValues{ case (sum, count) => (1.0 * sum) / count }
.collectAsMap()
// avgValue: scala.collection.Map[java.lang.String,Double] = Map(Z -> 0.0, B -> 6.0, A -> 3.0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment