Skip to content

Instantly share code, notes, and snippets.

View satisdrms's full-sized avatar

Sathish satisdrms

View GitHub Profile
@ytjia
ytjia / Spark to calculate Avg
Created October 30, 2014 16:39
Calculate average value in spark.
var data = sc.parallelize(Seq(("A", 2), ("A", 4), ("B", 2), ("Z", 0), ("B", 10)))
// data: org.apache.spark.rdd.RDD[(java.lang.String, Int)] = ParallelCollectionRDD[31] at parallelize at <console>:12
val avgValue = data.mapValues((_, 1)
.reduceByKey((x, y) => (x._1 + y._1, x._2 + y._2))
.mapValues{ case (sum, count) => (1.0 * sum) / count }
.collectAsMap()
// avgValue: scala.collection.Map[java.lang.String,Double] = Map(Z -> 0.0, B -> 6.0, A -> 3.0)