This benchmark has been misleading for a while. It was originally made to demonstrate how JIT compilers can do all sorts of crazy stuff to your code - especially LuaJIT - and was meant to be a starting point of discussion about what exactly LuaJIT does and how.
As a result, its not indicative of what its performance may be on more realistic data. Differences can be expected because
- the text will not consist of hard-coded constants
- the number of words (and therefore the dictionary) would be larger, and JIT compilers for JS and Lua often have special optimizations for small dictionaries/tables
- the words wont be pre-split, and allocating new words adds significant performance penalty (in that case a trie would probably outperform other approaches)
@marcomagdy - just changing to
const char *
rather thanstring
is somewhat dubious, as it relies on==
comparing the strings correctly, which comes down to whether or not the linker merges duplicated strings (or whether they were read from some other source rather than compiled code...). I decided to use aset
to scan over the array once at the start of the process and ensure all duplicated strings were using the same pointer. Note that both v8 and luajit will do exactly the same thing, although possibly with a slightly more specialised data structure... but for 19 strings, that really doesn't matter as it takes basically no time at all.