Skip to content

Instantly share code, notes, and snippets.

@shepmaster
Last active December 1, 2015 16:54
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save shepmaster/1d133bd6d5bca8175af3 to your computer and use it in GitHub Desktop.
Save shepmaster/1d133bd6d5bca8175af3 to your computer and use it in GitHub Desktop.
hash-rs results
bytes horner fnv sip farm xx
1 50 1000 50 33 55
2 95 2000 95 64 105
4 173 2000 173 121 200
8 400 2000 333 190 444
16 695 1600 592 301 800
32 1185 1333 888 248 1333
64 1939 1084 1207 351 2370
128 3368 895 1438 449 3764
256 4740 802 1570 502 5120
512 5885 791 1662 557 6320
1024 6826 782 1712 578 7111
2048 6918 749 1712 598 7160
bytes horner fnv sip farm xx
1 20 1 20 30 18
2 21 1 21 31 19
4 23 2 23 33 20
8 20 4 24 42 18
16 23 10 27 53 20
32 27 24 36 129 24
64 33 59 53 182 27
128 38 143 89 285 34
256 54 319 163 509 50
512 87 647 308 919 81
1024 150 1309 598 1769 144
2048 296 2733 1196 3421 286
<!DOCTYPE HTML>
<html>
<head>
<script src="//cdnjs.cloudflare.com/ajax/libs/dygraph/1.1.1/dygraph-combined.js"></script>
<style>
.dygraph-legend {
background: none !important;
}
p {
max-width: 60em;
}
</style>
</head>
<body id="inject">
<p>
These are benchmarks of how the different Rust hashing algorithm implementations
compare. In particular, they are using the Hasher interface, which may not be
optimal for all algorithms. Implementation quality may also vary.
</p>
<p>
Numbers are generated from a MacBook Pro (x64 OSX) with a 2.8 GHz Intel Core
i7 (Haswell) and 16GB of RAM. This is unfortunately quite important, as
even different cpu revisions by the same manufacturer (e.g. Haswell vs Sandy Bridge)
can produce significantly different results for these functions. As always,
bench your workload on your hardware for best results.
</p>
<p>
Variance bars excluded because this dang charting library was absolutely
freaking out with them. Note also that both axes are on a log-scale, so small
vertical differences may actually represent a large difference.
</p>
<p>
<a href="https://github.com/Gankro/hash-rs">Source Can be Found Here</a>
</p>
<p>
The hash functions on display are:
</p>
<p>
<strong>SipHash 2-4 (sip)</strong>: This is the state-of-the-art for hashing that is
completely resistant against even theoretical hash flooding DDOS attacks when
paired with a random seed. While most applications probably do not require
this, it's important for some usecases (for instance, many webservers can be
completely locked up by POST requests with colliding keys). Since this is a
relatively obscure attack, but quite costly when applicable, we believe that
it's important to default to such a hash function. As such, this is the default hasher
used by Rust's HashMap, but it can be overloaded. Sip's performance is overall
mediocre but consistent. As such, it's not an awful default.
</p>
<p>
Performance could be gained by moving to SipHash 2-3, which has slightly inferior
cryptographic properties, but is sufficient for preventing flooding.
</p>
<p>
<strong>Fowler-Noll-Vo (fnv)</strong>: A hash function with excellent distribution, but no particular
cryptographic properties. FNV is the best at small inputs, but does terribly on large
inputs. Since small inputs are much more common, it's the
Rust community's most popular override for SipHash where security doesn't matter.
The Rust compiler itself uses this function.
</p>
<p>
<strong>xxHash (xx)</strong>: Like fnv, a hash function with good distribution, but no crypto.
It's our best performer for large inputs, and performs strongly on small ones.
</p>
<p>
<strong>FarmHash (farm)</strong>: Another decent distribution, but no crypto. This algorithm is
intended to perform excellently on all sizes of input, but is massively pessimized
by Rust's streaming hashing architecture, as it's designed to process the input
in one shot, forcing the entire input to be buffered in a growable array. In order
for it to compete, Rust needs to adopt a solution that allows the top-level type
in a hashing to identify whether it can be hashed in a one-shot manner, for a
specialized result. However this can lead to confusing or inconsistent results,
particularly if hashing derivation is done incorrectly.
</p>
<p>
<strong>Horner (horner)</strong>: A universal hash function without the full cryptographic
strength of SipHash. It defends against all hash flooding attacks that have
been demonstrated in practice, but is theoretically vulnerable to a timing
attack if the attacker is allowed to continuously query against maps with the
same seed. It's unclear how practical this attack is under ideal conditions for
the attacker, but for most situations this hasher is likely sufficient.
Performance is roughly on-par with xxHash.
</p>
<p>
<strong>btree</strong> is not a hash function at all, but rather feeding the same
inputs into Rust's BTreeMap, which is based on comparisons instead of hashing. This
serves as a baseline, as well as a demonstration of how ordered data structures
can completely stomp unordered data structures under particular workloads, even
when not used for their ordering properties. <strong>btreenew</strong> is an
experimental rewrite of std's btree that is kicking its butt.
</p>
<script type="text/javascript">
function makeBench(name, title, xLabel) {
var benchDiv = document.createElement("div");
var timeDiv = makeDiv();
var tputDiv = makeDiv();
var heading = document.createElement("h1");
heading.textContent = title;
heading.style = "text-align: center;"
benchDiv.appendChild(heading);
benchDiv.appendChild(timeDiv);
benchDiv.appendChild(tputDiv);
document.getElementById("inject").appendChild(benchDiv);
makeGraph("Time (lower is better)",
xLabel,
"time (ns)",
name + "-time.csv",
timeDiv);
makeGraph("Throughput (higher is better)",
xLabel,
"throughput (MB/s)",
name + "-throughput.csv",
tputDiv);
}
function makeDiv() {
var div = document.createElement("div");
div.style = "display: inline-block; width:650px; height:600px;";
return div;
}
function makeGraph(title, xLabel, yLabel, csv, div) {
var graph = new Dygraph(
div,
csv,
{
logscale: true,
title: title,
xlabel: xLabel,
ylabel: yLabel,
strokeWidth: 2.0,
labelsDivWidth: 550,
legend: "always",
axes: {
x: { logscale: true, drawGrid: false },
y: { logscale: true, drawGrid: false },
},
series: {
sip: { color: "#cc0000" },
xx: { color: "#00cc00" },
fnv: { color: "#0000cc" },
farm: { color: "#cc00cc" },
murmur: { color: "#cccc00" },
city: { color: "#00cccc" },
btree: { color: "#000000" },
btreenew: { color: "#aaaaaa" },
horner: { color: "#884444" }
},
}
);
}
makeBench("bytes",
"Hashing an array of bytes",
"bytes hashed");
makeBench("mapcountdense",
"Counting number of occurrences of 1000 byte-strings (mostly duplicates)",
"bytes per string");
makeBench("mapcountsparse",
"Counting number of occurrences of 1000 byte-strings (mostly unique)",
"bytes per string");
</script>
</body>
</html>
bytes fnv btreenew xx farm horner btree sip
1 46 26 19 16 19 12 20
2 51 51 34 33 38 25 36
4 101 88 73 63 71 49 68
8 164 167 140 114 143 94 135
16 219 282 244 113 237 173 237
32 268 443 346 177 414 273 350
64 345 678 527 257 478 459 486
128 443 950 761 355 780 725 606
256 470 1197 972 441 996 1009 718
512 490 1408 1121 503 1147 1246 770
1024 503 1526 1210 548 1258 1414 799
2048 495 1548 1271 571 1257 1515 823
bytes fnv btreenew xx farm horner btree sip
1 21703 37570 50108 60439 51490 81935 49556
2 38589 38702 57164 60458 51816 78748 54414
4 39589 45062 54439 62903 55867 80229 58668
8 48612 47739 56848 69960 55883 84477 58874
16 72984 56711 65309 141025 67327 92093 67407
32 119007 72215 92410 180267 77204 116883 91194
64 185056 94285 121351 248816 133738 139373 131580
128 288861 134706 168113 360354 164003 176480 211167
256 544191 213783 263223 579728 256870 253497 356089
512 1043453 363618 456522 1017099 446126 410529 664573
1024 2030223 670483 845883 1865666 813376 723753 1278889
2048 4127622 1322136 1609456 3577810 1627222 1351140 2482813
bytes sip farm btreenew btree xx fnv horner
1 17 14 12 8 16 35 17
2 15 14 14 10 14 19 15
4 31 28 28 21 30 37 30
8 62 58 56 42 62 69 62
16 127 78 112 85 130 138 132
32 235 138 225 168 224 236 255
64 415 232 442 336 455 344 446
128 673 371 889 665 841 480 940
256 972 526 1754 1341 1542 594 1703
512 1241 668 3338 2594 2564 665 2745
1024 1445 788 6336 4900 3790 721 4042
2048 1572 837 12062 9375 4931 745 5126
bytes sip farm btreenew btree xx fnv horner
1 58589 67543 80191 118948 58981 27959 57471
2 125752 134104 138366 185257 134742 104634 128260
4 125109 139195 139341 186435 129424 106929 129230
8 128254 136289 140953 187908 127215 114920 127501
16 125909 204044 142331 186929 122719 115710 120340
32 136138 230488 141992 190065 142359 135044 125255
64 153847 274698 144633 190012 140443 185569 143267
128 190060 344148 143831 192180 152130 266415 136050
256 263333 486346 145922 190873 165942 430286 150293
512 412313 765573 153335 197315 199616 768652 186448
1024 707874 1298406 161603 208930 270096 1416477 253246
2048 1301053 2442994 169768 218397 415173 2743301 399405
OS X 10.11.1 (15B42)
Model Name: MacBook Pro
Model Identifier: MacBookPro10,1
Processor Name: Intel Core i7
Processor Speed: 2.3 GHz
Number of Processors: 1
Total Number of Cores: 4
L2 Cache (per Core): 256 KB
L3 Cache: 6 MB
Memory: 16 GB
rustc 1.6.0-dev (43b5d5e0d 2015-11-07)
binary: rustc
commit-hash: 43b5d5e0de1dbbe655160915cc9a7bb56a68d86c
commit-date: 2015-11-07
host: x86_64-apple-darwin
release: 1.6.0-dev
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment