Skip to content

Instantly share code, notes, and snippets.

View felixcheung's full-sized avatar
:octocat:
...

Felix Cheung felixcheung

:octocat:
...
View GitHub Profile
@felixcheung
felixcheung / sparkPercentile.scala
Last active March 28, 2018 08:26
Spark compute percentile with RDD in Scala
/**
* compute percentile from an unsorted Spark RDD
* @param data: input data set of Long integers
* @param tile: percentile to compute (eg. 85 percentile)
* @return value of input data at the specified percentile
*/
def computePercentile(data: RDD[Long], tile: Double): Double = {
// NIST method; data to be sorted in ascending order
val r = data.sortBy(x => x)
val c = r.count()