This benchmark has been misleading for a while. It was originally made to demonstrate how JIT compilers can do all sorts of crazy stuff to your code - especially LuaJIT - and was meant to be a starting point of discussion about what exactly LuaJIT does and how.
As a result, its not indicative of what its performance may be on more realistic data. Differences can be expected because
- the text will not consist of hard-coded constants
- the number of words (and therefore the dictionary) would be larger, and JIT compilers for JS and Lua often have special optimizations for small dictionaries/tables
- the words wont be pre-split, and allocating new words adds significant performance penalty (in that case a trie would probably outperform other approaches)
The summary is that naive benchmarks give you wrong and unrealistic results. You would expect these to be strings, and you would expect the objects to be regular hash tables, but in reality scripting languages can switch data structures or hashing functions for these depending on the way the code is executed at runtime (i.e. JIT) so your test might not be representative of what would happen in a real program.
Still, its an interesting starter for a conversation on micro-benchmarks and JIT optimizations. I recommend mraleph's blog for a deep dive (mostly from a JS perspective), e.g. https://mrale.ph/blog/2014/02/23/the-black-cat-of-microbenchmarks.html