Skip to content

Instantly share code, notes, and snippets.

@dmpetrov
Created March 6, 2017 04:49
Show Gist options
  • Save dmpetrov/e003503ca179c6b7e72e86106bbadbc6 to your computer and use it in GitHub Desktop.
Save dmpetrov/e003503ca179c6b7e72e86106bbadbc6 to your computer and use it in GitHub Desktop.
Check hypotheses with bootstrap using Spark
// Code for blogpost https://fullstackml.com/2016/01/19/how-to-check-hypotheses-with-bootstrap-and-apache-spark/
import scala.util.Sorting.quickSort
def getConfInterval(input: org.apache.spark.rdd.RDD[Double], N: Int, left: Double, right:Double)
: (Double, Double) = {
// Simulate by sampling and calculating averages for each of subsamples
val hist = Array.fill(N){0.0}
for (i <- 0 to N-1) {
hist(i) = input.sample(withReplacement = true, fraction = 1.0).mean
}
// Sort the averages and calculate quantiles
quickSort(hist)
val left_quantile = hist((N*left).toInt)
val right_quantile = hist((N*right).toInt)
return (left_quantile, right_quantile)
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment