Skip to content

Instantly share code, notes, and snippets.

@spion
Last active March 25, 2024 03:01
Show Gist options
  • Star 72 You must be signed in to star a gist
  • Fork 7 You must be signed in to fork a gist
  • Save spion/3049314 to your computer and use it in GitHub Desktop.
Save spion/3049314 to your computer and use it in GitHub Desktop.
C++ versus V8 versus luajit versus C benchmark - (hash) tables

Warning

This benchmark has been misleading for a while. It was originally made to demonstrate how JIT compilers can do all sorts of crazy stuff to your code - especially LuaJIT - and was meant to be a starting point of discussion about what exactly LuaJIT does and how.

As a result, its not indicative of what its performance may be on more realistic data. Differences can be expected because

  1. the text will not consist of hard-coded constants
  2. the number of words (and therefore the dictionary) would be larger, and JIT compilers for JS and Lua often have special optimizations for small dictionaries/tables
  3. the words wont be pre-split, and allocating new words adds significant performance penalty (in that case a trie would probably outperform other approaches)
spion@missperfect:~/Projects/testhash$ gcc -O3 test.c -o testc
spion@missperfect:~/Projects/testhash$ time ./testc
The 1000000
a 2000000
at 1000000
brown 1000000
dog 1000000
era 1000000
fox 1000000
jumped 1000000
lake 1000000
lazy 1000000
new 1000000
near 1000000
of 1000000
over 1000000
quick 1000000
restaurant 1000000
the 2000000
real 0m0.234s
user 0m0.228s
sys 0m0.004s
spion@missperfect:~/Projects/testhash$ g++ -O3 -std=gnu++0x test.cpp -o test
spion@missperfect:~/Projects/testhash$ time ./test
the 2000000
at 1000000
a 2000000
brown 1000000
dog 1000000
era 1000000
fox 1000000
jumped 1000000
The 1000000
lake 1000000
lazy 1000000
new 1000000
near 1000000
of 1000000
over 1000000
quick 1000000
restaurant 1000000
real 0m0.321s
user 0m0.320s
sys 0m0.000s
spion@missperfect:~/Projects/testhash$ time node test.js
The 1000000
quick 1000000
brown 1000000
fox 1000000
jumped 1000000
over 1000000
the 2000000
lazy 1000000
dog 1000000
at 1000000
a 2000000
restaurant 1000000
near 1000000
lake 1000000
of 1000000
new 1000000
era 1000000
real 0m0.317s
user 0m0.304s
sys 0m0.012s
spion@missperfect:~/Projects/testhash$ time luajit test.lua
new 1000000
lazy 1000000
The 1000000
era 1000000
of 1000000
at 1000000
fox 1000000
jumped 1000000
quick 1000000
brown 1000000
lake 1000000
over 1000000
near 1000000
the 2000000
dog 1000000
a 2000000
restaurant 1000000
real 0m0.161s
user 0m0.160s
sys 0m0.000s
// Like in the c++ version, we
// have a regular hash table
// with a custom hash function
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
struct fht_node {
char* key;
void* data;
struct fht_node * next;
};
typedef struct fht {
struct fht_node ** nodes;
size_t size;
} FHT;
FHT* fht_create() {
FHT *ht = malloc(sizeof(struct fht));
ht->size = 1 << (sizeof(char)*8);
ht->nodes = calloc(ht->size, sizeof(struct fht_node));
return ht;
}
fht_put(FHT* hashtbl, char* key, void* data) {
struct fht_node *node = hashtbl->nodes[key[0]];
while(node) {
if(!strcmp(node->key, key)) {
node->data=data;
return 0;
}
node=node->next;
}
node=malloc(sizeof(struct fht_node));
node->key= strdup(key);
node->data=data;
node->next=hashtbl->nodes[key[0]];
hashtbl->nodes[key[0]]=node;
}
void* fht_get(FHT* hashtbl, char* key) {
struct fht_node *node = hashtbl->nodes[key[0]];
while (node) {
if (!strcmp(node->key, key)) return node->data;
node = node->next;
}
return NULL;
}
int main() {
char* text[19] = {
"The", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog", "at", "a",
"restaurant", "near", "the", "lake", "of", "a", "new", "era"
};
FHT *hashtbl = fht_create();
int textLen = 19;
int times = 1000000;
int k, *cnt;
while (times--)
for (k = 0; k < textLen; ++k) {
cnt = fht_get(hashtbl, text[k]);
if (!cnt) {
cnt = malloc(sizeof(int));
*cnt = 1;
fht_put(hashtbl, text[k], cnt);
} else { *cnt += 1; }
}
for (k = 0; k < hashtbl->size; ++k) {
struct fht_node *n = hashtbl->nodes[k];
while (n) {
printf("%s %d\n", n->key, *((int *)n->data));
n = n->next;
}
}
}
#include <unordered_map>
#include <string>
#include <vector>
#include <iostream>
#include <string>
using namespace std;
// In C++ despite only accessing the hashtable
// once per loop, we also need a custom hash function
// otherwise the program will be 2 times slower.
struct CustomHash {
size_t operator() (const string& s) const { return (size_t)s[0]; }
};
int main() {
vector<string> text = {"The", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog", "at", "a", "restaurant", "near", "the", "lake", "of", "a", "new", "era"};
unordered_map<string, int, CustomHash> cnt;
int times = 1000000;
while (times--)
for (auto it = text.begin(); it != text.end(); ++it)
cnt[*it] += 1;
for (auto it = cnt.begin(); it != cnt.end(); ++it) {
cout << it->first << " " << it->second << endl;
}
}
var text = ["The", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog", "at", "a", "restaurant", "near", "the", "lake", "of", "a", "new", "era"],
map = {};
times = 1000001;
while (--times)
for (var k = 0; k < text.length; ++k) {
// Unlike luajit, if we put if (map[text[k]]) map[text[k]] += 1
// in v8, this program will become 6 times slower. Instead we create a new object with a counter
// and this way we'll only need to access the hashtable once per loop.
var kth = map[text[k]];
if (kth) kth.c += 1;
else map[text[k]] = {c:1};
}
for (var key in map) {
console.log(key, map[key].c);
}
-- The lua program needs no optimizations.
-- It just works. Fast.
-- Note that luajit probably optimizes away
-- all dictionary lookups :)
local text = {"The", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog", "at", "a", "restaurant", "near", "the", "lake", "of", "a", "new", "era"}
local map = {}
local times = 1000000
while times > 0 do
for i = 1, table.getn(text), 1 do
if map[text[i]] then map[text[i]] = map[text[i]] + 1
else map[text[i]] = 1 end
end
times = times - 1;
end
for key, value in pairs(map) do
io.write(key, " ", value, "\n")
end
@spion
Copy link
Author

spion commented Feb 14, 2022

The summary is that naive benchmarks give you wrong and unrealistic results. You would expect these to be strings, and you would expect the objects to be regular hash tables, but in reality scripting languages can switch data structures or hashing functions for these depending on the way the code is executed at runtime (i.e. JIT) so your test might not be representative of what would happen in a real program.

Still, its an interesting starter for a conversation on micro-benchmarks and JIT optimizations. I recommend mraleph's blog for a deep dive (mostly from a JS perspective), e.g. https://mrale.ph/blog/2014/02/23/the-black-cat-of-microbenchmarks.html

@thetrung
Copy link

Surely, it's always based on real use cases but playground benchmark like this, is really useful to have. There are alot of things happen behind every compiler optimization that programmers may not know or may not need to know, which hide away alot of complexity in development.

I hope people will keep coming back on this thread with their own benchmarks on every language that we are using today.

Productivity vs. Speed always will be fun to discuss.

@kokizzu
Copy link

kokizzu commented Feb 17, 2022

that last C code is cheating btw, it's comparing pointer to array of char, not comparing the array of char (string) itself
so it wouldn't give same result if the string is from user input / not constants.

@dlannan
Copy link

dlannan commented May 30, 2022

The last C code isnt doing a hash lookup. Its a plain array. Which if you read the original code, they were all doing hash map lookups.
This is something luajit does very well because its builtin and the jit optimises the hash lookups at runtime. As others have mentioned, you need to write platform specific code in C to be able to achieve this - and its alot of work (essentially what the lua jit does).
In commercial work, I can vouch for luajit. I developed software for realtime guaranteed packet capture systems (fmad.io) running luajit - and in many cases luajit was easily the best fit for purpose in performance and ease of implementation. We often worked on many GB data sets and with some big network pipes that we needed to run guaranteed packet capture on.
The biggest benefit I found was using luajit as a high speed binder for Clibs.. with zero overhead with ffi calls to c it meant we could utilise C libraries, as well as still use luajit for building the app side. The interesting thing, is how much luajit is used in the network capture world.

@FrankHB
Copy link

FrankHB commented Sep 6, 2023

As a common sense of such micro benchmarks, the init (and deinit) time counts. For example, in hosted implementation of C and C++ on x86_64 Linux, typically the C runtime will have an overhead of less than 100k of insns and C++ will have 3M. This still does not indicate that which one is definitely more efficint though: both can have freestanding implementations with less dynamic initialization required (e.g. getting rid of the iostream initialization mandated by the standard on a C++ hosted implementation). For typical manner of hosted implementations, if the kernel mode runtime/loader is implemented in C++, processes in userland can even share with pre-initized environments with no overhead at all. That said, why benchmarks of userland programs should rely on such stuff? Because this really makes differences when your benchmark runs sufficiently fast.

OTOH, there is nothing comparable on Lua or JS. There are no specs on what exactly mean to implement freestanding environments for them. The runtime initialization do incur some non-deterministic overhead which typically in turn rely on the native init overhead. Then for userland programs, you will compare "impure" implementations: Lua+C vs. JS+C++ instead of Lua vs. JS, for sure. This will not change before you can implement a hosted environment runtime (e.g. the OS kernel and loader) in your language being compared. Well, in this sense, Linux C++ is actually implemented in C (Linux kernel + e.g. glibc ld.so) + C++, but at least there are standard-conforming ways to replace non-C++ parts as long as you can replace Linux and libc, in reality. And where is the native loader implemented in JS/Lua?

I'm a bit surprised that nobody has pointed out this...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment