Skip to content

Instantly share code, notes, and snippets.

@anp
Last active January 15, 2016 20:15
Show Gist options
  • Save anp/70629514520b420240a7 to your computer and use it in GitHub Desktop.
Save anp/70629514520b420240a7 to your computer and use it in GitHub Desktop.
Benchmarking Rust's HashMap<Vec<u8>, _> with custom hashers

Context

TLDR: I'm looking for any advice on how to improve the benchmarks below so I can have confidence in the results before moving on to the next feature in this project.

I'm currently working on a tool which relies heavily on the performance of Rust's HashMap implementation in the standard library. Since reading about the cryptographic hashing algorithm used in the standard library, I've been wondering if I couldn't improve the performance of my program just by swapping out the hashing algorithm (using the unstable hashmap_hasher feature).

The benchmarks mimic the program (which takes short 7-15 byte vectors and counts them and their positions in a source string), with the exception that instead of accumulating bytestring locations as well as counts, these benchmarks just count the instances. I hoped to do this to approximate the lookup and mutation costs of the HashMap without also mixing Vec's performance in the benchmark (because I can't affect that using different hashing algorithms).

There are several benchmarks I found which address just various hashing algorithms' speed, but I want to make sure that speed actually impacts my use case in a meaningful way, and I only wanted to benchmark those crates which already implement the std::collections::hash_state::HashState trait.

Is this overkill? Are there any gaps in what I'm measuring? Am I accidentally benchmarking different behavior than the hashing algorithms themselves? Feedback would be very appreciated.

Setup

$ rustc --version --verbose
rustc 1.7.0-nightly (2fb0c5ebc 2016-01-14)
binary: rustc
commit-hash: 2fb0c5ebcf6d912224532265776fb96febea9797
commit-date: 2016-01-14
host: x86_64-unknown-linux-gnu
release: 1.7.0-nightly

Results

algorithm run 1 (ns) run 2 (ns) run 3 (ns) avg (ns) minus overhead (ns) percent faster
crc24 614 604 614 610.7 378.0 4.5%
default 637 612 636 628.3 395.7 0.0%
farm 634 633 644 637.0 404.3 -2.2%
metro 590 586 599 591.7 359.0 9.3%
fnv 581 579 598 586.0 353.3 10.7%
twox_fixed 612 614 620 615.3 382.7 3.3%
twox_random 611 614 623 616.0 383.3 3.1%
indexing overhead 234 232 232 233
$ for i in {1..3}; do cargo bench; done
   Compiling hash-test v0.1.0 (file:///home/adam/rust-projects/hash-test)
     Running target/release/hash_test-bbd6a6d5fb54cc08

running 8 tests
test bench_crc24            ... bench:         614 ns/iter (+/- 169) = 13 MB/s
test bench_default          ... bench:         637 ns/iter (+/- 75) = 12 MB/s
test bench_farmhash         ... bench:         634 ns/iter (+/- 112) = 12 MB/s
test bench_metrohash        ... bench:         590 ns/iter (+/- 96) = 13 MB/s
test bench_random_indexes   ... bench:         234 ns/iter (+/- 22)
test bench_servo_fnv        ... bench:         581 ns/iter (+/- 45) = 13 MB/s
test bench_twox_fixed_seed  ... bench:         612 ns/iter (+/- 24) = 13 MB/s
test bench_twox_random_seed ... bench:         611 ns/iter (+/- 30) = 13 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured

     Running target/release/hash_test-bbd6a6d5fb54cc08

running 8 tests
test bench_crc24            ... bench:         607 ns/iter (+/- 38) = 13 MB/s
test bench_default          ... bench:         612 ns/iter (+/- 41) = 13 MB/s
test bench_farmhash         ... bench:         633 ns/iter (+/- 63) = 12 MB/s
test bench_metrohash        ... bench:         586 ns/iter (+/- 15) = 13 MB/s
test bench_random_indexes   ... bench:         232 ns/iter (+/- 17)
test bench_servo_fnv        ... bench:         579 ns/iter (+/- 31) = 13 MB/s
test bench_twox_fixed_seed  ... bench:         614 ns/iter (+/- 50) = 13 MB/s
test bench_twox_random_seed ... bench:         614 ns/iter (+/- 31) = 13 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured

     Running target/release/hash_test-bbd6a6d5fb54cc08

running 8 tests
test bench_crc24            ... bench:         618 ns/iter (+/- 42) = 12 MB/s
test bench_default          ... bench:         636 ns/iter (+/- 55) = 12 MB/s
test bench_farmhash         ... bench:         644 ns/iter (+/- 42) = 12 MB/s
test bench_metrohash        ... bench:         599 ns/iter (+/- 55) = 13 MB/s
test bench_random_indexes   ... bench:         232 ns/iter (+/- 11)
test bench_servo_fnv        ... bench:         598 ns/iter (+/- 63) = 13 MB/s
test bench_twox_fixed_seed  ... bench:         620 ns/iter (+/- 34) = 12 MB/s
test bench_twox_random_seed ... bench:         623 ns/iter (+/- 38) = 12 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured
[package]
name = "hash-test"
version = "0.1.0"
authors = ["Adam Perry <adam.n.perry@gmail.com>"]
[dependencies]
rand = "0.3.13"
metrohash = "0.1.0"
twox-hash = "0.1.1"
fnv = "1.0.0"
crc24 = "0.1.6"
farmhash = "1.1.4"
#![feature(test, hashmap_hasher, simd)]
extern crate test;
use test::Bencher;
extern crate rand;
use rand::{Rng, Isaac64Rng};
use std::collections::HashMap;
use std::collections::hash_state::{DefaultState, HashState};
fn bench_generic<H>(b: &mut Bencher, mut map: HashMap<Vec<u8>, u32, H>)
where H: HashState
{
let key_len = 8;
let mut rng = Isaac64Rng::new_unseeded();
let input = generate_all_n_bytestrings(key_len, vec![b'A', b'C', b'G', b'T', b'N']);
for s in &input {
map.insert(s.to_owned(), 0);
}
b.bytes = key_len as u64;
b.iter(|| {
let idx = rng.gen_range(0, input.len());
let key = input[idx].to_owned();
let mut count = map.entry(key).or_insert(0);
*count = *count + 1;
})
}
///////////////////////////////////////////////////////////////////////////////////
// Identifying random lookup and copy overhead
///////////////////////////////////////////////////////////////////////////////////
#[bench]
fn bench_random_indexes(b: &mut Bencher) {
let mut rng = Isaac64Rng::new_unseeded();
let input = generate_all_n_bytestrings(8, vec![b'A', b'C', b'G', b'T', b'N']);
b.iter(|| {
let idx = rng.gen_range(0, input.len());
let _ = input[idx].to_owned();
})
}
///////////////////////////////////////////////////////////////////////////////////
// Default stdlib implementation
///////////////////////////////////////////////////////////////////////////////////
use std::collections::hash_map::RandomState;
#[bench]
fn bench_default(b: &mut Bencher) {
let map: HashMap<Vec<u8>, u32, RandomState> = Default::default();
bench_generic(b, map);
}
///////////////////////////////////////////////////////////////////////////////////
// metrohash crate
///////////////////////////////////////////////////////////////////////////////////
extern crate metrohash;
use metrohash::MetroHash;
#[bench]
fn bench_metrohash(b: &mut Bencher) {
let map: HashMap<Vec<u8>, u32, DefaultState<MetroHash>> = Default::default();
bench_generic(b, map);
}
///////////////////////////////////////////////////////////////////////////////////
// twox_hash crate
///////////////////////////////////////////////////////////////////////////////////
extern crate twox_hash;
use twox_hash::{XxHash, RandomXxHashState};
#[bench]
fn bench_twox_fixed_seed(b: &mut Bencher) {
let map: HashMap<Vec<u8>, u32, DefaultState<XxHash>> = Default::default();
bench_generic(b, map);
}
#[bench]
fn bench_twox_random_seed(b: &mut Bencher) {
let map: HashMap<Vec<u8>, u32, RandomXxHashState> = Default::default();
bench_generic(b, map);
}
///////////////////////////////////////////////////////////////////////////////////
// fnv crate (from servo)
///////////////////////////////////////////////////////////////////////////////////
extern crate fnv;
use fnv::FnvHasher;
#[bench]
fn bench_servo_fnv(b: &mut Bencher) {
let map: HashMap<Vec<u8>, u32, DefaultState<FnvHasher>> = Default::default();
bench_generic(b, map);
}
///////////////////////////////////////////////////////////////////////////////////
// crc24 crate
///////////////////////////////////////////////////////////////////////////////////
extern crate crc24;
use crc24::Crc24Hasher;
#[bench]
fn bench_crc24(b: &mut Bencher) {
let map: HashMap<Vec<u8>, u32, DefaultState<Crc24Hasher>> = Default::default();
bench_generic(b, map);
}
///////////////////////////////////////////////////////////////////////////////////
// farmhash crate
///////////////////////////////////////////////////////////////////////////////////
extern crate farmhash;
use farmhash::FarmHasher;
#[bench]
fn bench_farmhash(b: &mut Bencher) {
let map: HashMap<Vec<u8>, u32, DefaultState<FarmHasher>> = Default::default();
bench_generic(b, map);
}
///////////////////////////////////////////////////////////////////////////////////
fn generate_all_n_bytestrings(length: usize, alphabet: Vec<u8>) -> Vec<Vec<u8>> {
let mut strings: Vec<Vec<u8>> = Vec::new();
for b in &alphabet {
strings.push(vec![*b]);
}
for _ in 2..(length + 1) {
let mut new_strings = Vec::new();
for s in strings {
for byte in &alphabet {
let mut new_string = s.to_owned();
new_string.push(*byte);
new_strings.push(new_string);
}
}
strings = new_strings;
}
strings.into_iter().filter(|v| v.len() == length).collect()
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment