These are benchmarks demonstrating claims made in the quick and practical
"MSI" hash table. It pits an MSI hash table set in C using an
integer permutation hash and an AES-NI hash, a C++ std::set
, and Go
map[string]struct{}
. Inputs are strings of length 1–8.
My results on an i7-8650U, GCC 12.0.1 (libstdc++), Clang 14.0.6 (libc++):
Time (s) Memory (MiB)
GCC MSI/perm 0.27 64
Clang MSI/perm 0.27 "
GCC MSI/AES-NI 0.32 "
Clang MSI/AES-NI 0.32 "
GCC std::set 1.08 320
Clang std::set 1.43 320
Go 1.19 1.46 320
The clear winner is MSI with an integer permutation, followed by MSI with AES-NI. Overall, MSI is an order of magnitude faster and smaller than a generic hash table, in large part because it's custom-tailored for the context.
As an added bonus, UBSan and ASan have practically no performance impact
on the MSI benchmarks, and debug builds (-O0
) are only about a 2x
slowdown (i.e. still faster than optimized C++), vs. a 10x slowdown for
C++.
Likely objection: The C++ and Go versions use full strings and dynamically allocate, while the C version uses fixed-length strings and table, so it's not fair. However, that's the whole point. The C++ and Go examples are written idiomatically for this problem, but the use of MSI in the C version is part of a holistic strategy — something not possible with generic hash tables — which is also why it's longer.