Skip to content

Instantly share code, notes, and snippets.

@aeshirey
Last active December 3, 2020 03:14
Show Gist options
  • Save aeshirey/afaa151fc069025236ab5ee742a01274 to your computer and use it in GitHub Desktop.
Save aeshirey/afaa151fc069025236ab5ee742a01274 to your computer and use it in GitHub Desktop.
Parallelized quantile estimation with Greenwald-Khanna
// `probably` crate @ https://github.com/aeshirey/probably/
// Use it by specifying probably = "0.2.0" in your Cargo.toml
use probably::quantile::greenwald_khanna;
use std::fs::File;
use std::io::{BufRead, BufReader};
fn main() {
const EPSILON: f64 = 0.001;
const QUANTILE: f64 = 0.5; // median
// Random numbers generated in Python with:
// nums = [random.randint(0, 10_000) for _ in range(100_000)]
// with fh = open('rands.txt', 'w') as fh: fh.write('\n'.join(map(str, nums)))
let file = File::open("rands.txt").unwrap();
let reader = BufReader::new(file);
// Two partial estimators. Each receives a proper subset of input, but together they receive the full dataset.
let mut gk1 = greenwald_khanna::Stream::new(EPSILON);
let mut gk2 = greenwald_khanna::Stream::new(EPSILON);
// Conversely, a single estimator that gets it all.
let mut gk = greenwald_khanna::Stream::new(EPSILON);
for line in reader.lines() {
let value: u32 = line.unwrap().parse().unwrap();
gk.insert(value);
// arbitrarily pick an estimator to get this value
if value % 2 == 0 {
gk1.insert(value);
} else {
gk2.insert(value);
}
}
// merge partial estimators
gk1 += gk2;
// actual: sorted(nums[49999]) == 5005
println!("[partitioned] median: {}", gk1.quantile(QUANTILE)); // 5006
println!("[complete] median: {}", gk.quantile(QUANTILE)); // 5013
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment