This benchmark has been misleading for a while. It was originally made to demonstrate how JIT compilers can do all sorts of crazy stuff to your code - especially LuaJIT - and was meant to be a starting point of discussion about what exactly LuaJIT does and how.
As a result, its not indicative of what its performance may be on more realistic data. Differences can be expected because
- the text will not consist of hard-coded constants
- the number of words (and therefore the dictionary) would be larger, and JIT compilers for JS and Lua often have special optimizations for small dictionaries/tables
- the words wont be pre-split, and allocating new words adds significant performance penalty (in that case a trie would probably outperform other approaches)
And I finally got C++ down to 90ms, just edging out luajit at 95ms. Had to use a custom data structure to achieve it, but then both JITs use similar structures internally, which are optimized for holding a very small number of items with fairly fixed access patterns...
And for this implementation, clang++ is faster than g++, clocking in at 71ms (median of 5 runs).