Skip to content

Instantly share code, notes, and snippets.

@spion
Last active March 25, 2024 03:01
Show Gist options
  • Star 72 You must be signed in to star a gist
  • Fork 7 You must be signed in to fork a gist
  • Save spion/3049314 to your computer and use it in GitHub Desktop.
Save spion/3049314 to your computer and use it in GitHub Desktop.
C++ versus V8 versus luajit versus C benchmark - (hash) tables

Warning

This benchmark has been misleading for a while. It was originally made to demonstrate how JIT compilers can do all sorts of crazy stuff to your code - especially LuaJIT - and was meant to be a starting point of discussion about what exactly LuaJIT does and how.

As a result, its not indicative of what its performance may be on more realistic data. Differences can be expected because

  1. the text will not consist of hard-coded constants
  2. the number of words (and therefore the dictionary) would be larger, and JIT compilers for JS and Lua often have special optimizations for small dictionaries/tables
  3. the words wont be pre-split, and allocating new words adds significant performance penalty (in that case a trie would probably outperform other approaches)
spion@missperfect:~/Projects/testhash$ gcc -O3 test.c -o testc
spion@missperfect:~/Projects/testhash$ time ./testc
The 1000000
a 2000000
at 1000000
brown 1000000
dog 1000000
era 1000000
fox 1000000
jumped 1000000
lake 1000000
lazy 1000000
new 1000000
near 1000000
of 1000000
over 1000000
quick 1000000
restaurant 1000000
the 2000000
real 0m0.234s
user 0m0.228s
sys 0m0.004s
spion@missperfect:~/Projects/testhash$ g++ -O3 -std=gnu++0x test.cpp -o test
spion@missperfect:~/Projects/testhash$ time ./test
the 2000000
at 1000000
a 2000000
brown 1000000
dog 1000000
era 1000000
fox 1000000
jumped 1000000
The 1000000
lake 1000000
lazy 1000000
new 1000000
near 1000000
of 1000000
over 1000000
quick 1000000
restaurant 1000000
real 0m0.321s
user 0m0.320s
sys 0m0.000s
spion@missperfect:~/Projects/testhash$ time node test.js
The 1000000
quick 1000000
brown 1000000
fox 1000000
jumped 1000000
over 1000000
the 2000000
lazy 1000000
dog 1000000
at 1000000
a 2000000
restaurant 1000000
near 1000000
lake 1000000
of 1000000
new 1000000
era 1000000
real 0m0.317s
user 0m0.304s
sys 0m0.012s
spion@missperfect:~/Projects/testhash$ time luajit test.lua
new 1000000
lazy 1000000
The 1000000
era 1000000
of 1000000
at 1000000
fox 1000000
jumped 1000000
quick 1000000
brown 1000000
lake 1000000
over 1000000
near 1000000
the 2000000
dog 1000000
a 2000000
restaurant 1000000
real 0m0.161s
user 0m0.160s
sys 0m0.000s
// Like in the c++ version, we
// have a regular hash table
// with a custom hash function
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
struct fht_node {
char* key;
void* data;
struct fht_node * next;
};
typedef struct fht {
struct fht_node ** nodes;
size_t size;
} FHT;
FHT* fht_create() {
FHT *ht = malloc(sizeof(struct fht));
ht->size = 1 << (sizeof(char)*8);
ht->nodes = calloc(ht->size, sizeof(struct fht_node));
return ht;
}
fht_put(FHT* hashtbl, char* key, void* data) {
struct fht_node *node = hashtbl->nodes[key[0]];
while(node) {
if(!strcmp(node->key, key)) {
node->data=data;
return 0;
}
node=node->next;
}
node=malloc(sizeof(struct fht_node));
node->key= strdup(key);
node->data=data;
node->next=hashtbl->nodes[key[0]];
hashtbl->nodes[key[0]]=node;
}
void* fht_get(FHT* hashtbl, char* key) {
struct fht_node *node = hashtbl->nodes[key[0]];
while (node) {
if (!strcmp(node->key, key)) return node->data;
node = node->next;
}
return NULL;
}
int main() {
char* text[19] = {
"The", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog", "at", "a",
"restaurant", "near", "the", "lake", "of", "a", "new", "era"
};
FHT *hashtbl = fht_create();
int textLen = 19;
int times = 1000000;
int k, *cnt;
while (times--)
for (k = 0; k < textLen; ++k) {
cnt = fht_get(hashtbl, text[k]);
if (!cnt) {
cnt = malloc(sizeof(int));
*cnt = 1;
fht_put(hashtbl, text[k], cnt);
} else { *cnt += 1; }
}
for (k = 0; k < hashtbl->size; ++k) {
struct fht_node *n = hashtbl->nodes[k];
while (n) {
printf("%s %d\n", n->key, *((int *)n->data));
n = n->next;
}
}
}
#include <unordered_map>
#include <string>
#include <vector>
#include <iostream>
#include <string>
using namespace std;
// In C++ despite only accessing the hashtable
// once per loop, we also need a custom hash function
// otherwise the program will be 2 times slower.
struct CustomHash {
size_t operator() (const string& s) const { return (size_t)s[0]; }
};
int main() {
vector<string> text = {"The", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog", "at", "a", "restaurant", "near", "the", "lake", "of", "a", "new", "era"};
unordered_map<string, int, CustomHash> cnt;
int times = 1000000;
while (times--)
for (auto it = text.begin(); it != text.end(); ++it)
cnt[*it] += 1;
for (auto it = cnt.begin(); it != cnt.end(); ++it) {
cout << it->first << " " << it->second << endl;
}
}
var text = ["The", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog", "at", "a", "restaurant", "near", "the", "lake", "of", "a", "new", "era"],
map = {};
times = 1000001;
while (--times)
for (var k = 0; k < text.length; ++k) {
// Unlike luajit, if we put if (map[text[k]]) map[text[k]] += 1
// in v8, this program will become 6 times slower. Instead we create a new object with a counter
// and this way we'll only need to access the hashtable once per loop.
var kth = map[text[k]];
if (kth) kth.c += 1;
else map[text[k]] = {c:1};
}
for (var key in map) {
console.log(key, map[key].c);
}
-- The lua program needs no optimizations.
-- It just works. Fast.
-- Note that luajit probably optimizes away
-- all dictionary lookups :)
local text = {"The", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog", "at", "a", "restaurant", "near", "the", "lake", "of", "a", "new", "era"}
local map = {}
local times = 1000000
while times > 0 do
for i = 1, table.getn(text), 1 do
if map[text[i]] then map[text[i]] = map[text[i]] + 1
else map[text[i]] = 1 end
end
times = times - 1;
end
for key, value in pairs(map) do
io.write(key, " ", value, "\n")
end
@olmeca
Copy link

olmeca commented Mar 2, 2014

Here's a Nimrod version:

import tables

var 
   words = ["The", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog", "at", "a", "restaurant", "near", "the", "lake", "of", "a", "new", "era"]
   hash = initTable[string, int](32)
   times = 1000000

while times > 0:
   for i in 0..len(words)-1:
      hash[words[i]] = `[]`(hash, words[i]) + 1
   times = times - 1

for kv in pairs(hash):
   echo kv.key & ", " & $kv.val

As it turns out, it performs terribly:
real 0m6.106s
user 0m6.093s
sys 0m0.003s

@bearophile
Copy link

Basic version in D language (more than twice slower than the C++ version, so probably it's about as fast as the C++ version without CustomHash):

void main() {
    import std.stdio, std.string;

    const words = "The quick brown fox jumped over the lazy dog
                   at a restaurant near the lake of a new era".split;

    uint[string] counts;

    foreach (immutable _; 0 .. 1_000_000)
        foreach (immutable word; words)
            counts[word]++;

    writefln("%-(%s %d\n%)", counts);
}

A bit smarter version (a little faster than the C++ version):

void main() {
    import std.stdio, std.algorithm, std.array;

    const words = [
        "The", "quick", "brown", "fox", "jumped", "over",
        "the", "lazy", "dog", "at", "a", "restaurant",
        "near", "the", "lake", "of", "a", "new", "era"]
        .map!(w => w.ptr).array;

    uint[immutable(char)*] counts;

    foreach (immutable _; 0 .. 1_000_000)
        foreach (immutable w; words)
            counts[w]++;

    foreach (w, count; counts)
        printf("%s %u\n", w, count);
}

@marcomagdy
Copy link

Here is a faster (than luajit) C++ version. There was no need to use std::string since all strings in this micro-benchmark are constants and not using any of std::string's facilities. Same thing with std::vector, it could be replaced with std::array.

I also changed the hash function's implementation to use the trick @funny-falcon mentioned.

#include <unordered_map>
#include <array>
#include <iostream>

using namespace std;

struct CustomHash {
    size_t operator() (char const* s) const { return reinterpret_cast<size_t>(s); }
};


int main() {
    array<char const*, 19> text = { "The", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog", "at", "a", "restaurant", "near", "the", "lake", "of", "a", "new", "era" };
    unordered_map<char const*, int, CustomHash> cnt;
    int times = 1000000;
    while (times--)
        for (auto const& word : text)
            cnt[word] += 1;


    for (auto const& keyval : cnt) {
        cout << keyval.first << " " << keyval.second << endl;
    }
    return 0;
}

@craigbarnes
Copy link

@marcomagdy With g++ -O3 that runs in 0.495s for me vs 0.186s for LuaJIT with the code below (~15% faster than the original).

local text = {"The", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog", "at", "a", "restaurant", "near", "the", "lake", "of", "a", "new", "era"}
local len = #text
local map = {}
local times = 1000000

for i = 1, times do
    for j = 1, len do
        local k = text[j]
        map[k] = (map[k] or 0) + 1
    end
end

for key, value in pairs(map) do
    io.write(key, " ", value, "\n")
end

@nurettin
Copy link

// in v8, this program will become 6 times slower. Instead we create a new object with a counter
// and this way we'll only need to access the hashtable once per loop.

^^^ and why do you force a hash lookup in the C++ loop ?

@jh3141
Copy link

jh3141 commented Dec 6, 2014

I've forked this gist and updated the C++ implementation to one that's over 3 times faster on the machine I was testing on (110ms vs 410ms). See https://gist.github.com/jh3141/768a9672dba9a8fc2524

luajit is still a little faster, but it's very close now.

@jh3141
Copy link

jh3141 commented Dec 6, 2014

@nurettin - the C++ version only does one hash lookup per string per iteration -- unlike in javascript, accessing an uninitialised entry in the map initialises it to zero (as that's what int's default constructor does), so you can just increment without checking it first.

@jh3141
Copy link

jh3141 commented Dec 6, 2014

@marcomagdy - just changing to const char * rather than string is somewhat dubious, as it relies on == comparing the strings correctly, which comes down to whether or not the linker merges duplicated strings (or whether they were read from some other source rather than compiled code...). I decided to use a set to scan over the array once at the start of the process and ensure all duplicated strings were using the same pointer. Note that both v8 and luajit will do exactly the same thing, although possibly with a slightly more specialised data structure... but for 19 strings, that really doesn't matter as it takes basically no time at all.

@jh3141
Copy link

jh3141 commented Dec 6, 2014

And I finally got C++ down to 90ms, just edging out luajit at 95ms. Had to use a custom data structure to achieve it, but then both JITs use similar structures internally, which are optimized for holding a very small number of items with fairly fixed access patterns...

And for this implementation, clang++ is faster than g++, clocking in at 71ms (median of 5 runs).

@ayosec
Copy link

ayosec commented Dec 8, 2014

Rust version:

use std::collections::HashMap;
use std::collections::hash_map::Entry;

fn main() {
  let text = [
    "The", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog",
    "at", "a", "restaurant", "near", "the", "lake", "of", "a", "new", "era"
  ];

  let mut map: HashMap<&str, u32> = HashMap::new();

  for _ in range(0u32, 100000) {
    for item in text.iter() {
      match map.entry(*item) {
        Entry::Occupied(mut e) => { *e.get_mut() += 1; },
        Entry::Vacant(e)       => { e.set(1u32); }
      };
    }
  }

  for (k, v) in map.iter() {
    println!("{} {}", k, v)
  }
}

Time for Rust version:

real    0m0.104s
user    0m0.100s
sys 0m0.000s

Time for C version:

real    0m0.179s
user    0m0.176s
sys 0m0.000s

@topstarnetwork
Copy link

Hello, I am not a coder and all the above is too complicated for me, after all your benchmark study would you have a clear comparison table that shows wich one from C++, V8, luajit or C is the most lightweight programing language and has the most very small memory footprint???

@joeatbayes
Copy link

The really interesting aspect especially when building ML algorithms is to take a non trivial amount of data. EG: Index all the words occurring in all the WIKI pages with count for each word and the count by word by URI. This kind of non trivial problem is where I have found Luajit breaks down. The 2 Gig memory limit with significant pauses over 1 Gig means you are very quickly forced build your hash table using FFI data structures or link in a C library for the hash table. This means that for a non trivial problem the speed crossing the FFI barrier will be more critical.

Copy link

ghost commented Feb 18, 2017

C code in a comparison is a total mess. Doing the memory allocation for a single integer inside of a cycle
is the most crazy idea I've ever seen:cnt = malloc(sizeof(int));
besides the C-code doesn't replicate what Lua code is doing.
The similar code in C would look like this:

#include <stdio.h>
int main() {
	const char *text[] = {"The", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog", "at", "a", "restaurant", "near", "the", "lake", "of", "a", "new", "era"};
	const int textLen = sizeof(text)/sizeof(text[0]);
	int map[textLen];
	int times = 1000000;
	for (int k = 0; k < textLen; ++k)	map[k] = 0;
	while (times--)	  for (int k = 0; k < textLen; ++k)	map[k] += 1;
	for (int k = 0; k < textLen; ++k)	printf("%s %d\n", text[k], map[k]);
}

The results:

real	0m0.003s
user	0m0.000s
sys	0m0.000s

Its about 25x times faster than Lua.

Please don't make such useless comparison anymore. Thx

P.S. A hint. If you see the C-code is running slower, than any language besides Assembler then 100% you are doing something wrong. Absolutely any program in any language can be translated directly into C-code and cannot run faster than C, because the last one is simplified Assembler and nothing else. Nobody can top CPU instructions on the same CPU with what ever method.

@Dzejrou
Copy link

Dzejrou commented Mar 13, 2017

@Zack24 While I'm not gonna argue whether the C code is a mess or not, the C code you posted works with an array. The Lua code works with a hash table :) So yes, your C code will be faster than the Lua code because your code does basically nothing.

And to your hint, Luajit is a tracing JIT, which has a) information about the exact hardware it runs on (which a C compiler does not) and b) a tracing window of the past X instruction which allows it to perform more optimizations based on the runtime of the program. So yes, a Lua code can be faster than C code (given the right benchmarks), because C relies on static compilation.

@towc
Copy link

towc commented Sep 7, 2017

Over 3 years have passed, I wonder how these compare now. I'm a JS fanboy so I'm going to champion for it. It can be marginally more optimized, but that goes for all other programs. One thing that you need to note is this though:

$ time node -e "console.log(1)"
1

real	0m0.237s
user	0m0.232s
sys	0m0.004s

on my machine. If you want a fair comparison with JIT languages, it's inherent that you need a larger code structure, which is where run-time optimizations really shine, compared to compile-time optimizations

@thetrung
Copy link

Update on 2020 anyone ?

LuaJIT vs NodeJS is a really interesting competition that I want to see more benchmarks here.
They both are great scripting language that I'm comfortable mostly beside compiled .NET language.

@spion
Copy link
Author

spion commented May 26, 2020

Quick summary

  • nodejs: 0.5s
  • luajit: 0.08s
  • C: 0.173s
  • C++: 0.36s

Switching to regular maps with node is now better:

let text = ["The", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog", "at", "a", "restaurant", "near", "the", "lake", "of", "a", "new", "era"],
    map = new Map(),
    times = 1000001;

while (--times > 0)
    for (let k = 0; k < text.length; ++k) {
        let key = text[k];
        let kth = map.get(key) // map[text[k]];
        if (kth != null) kth.c += 1;
        else map.set(key, {c:1})
    }

for (let [key, val] of map) {
    console.log(key, val.c);
}

and results with 0.33s run time.

@spion
Copy link
Author

spion commented May 26, 2020

To offset the effect of startup time we can multiply the times by 10x in which case we get (with the updated JS version):

luajit: 0.76s
C++: 3.35s
C: 1.44s
JS (with Map): 3.00s

It seems like only luajit is doing super-clever (unrealistic) optimizations at this point 😀

@thetrung
Copy link

JS faster than C++ is kinda funny.

Impressed with LuaJIT, must be the compiler doing the magic.

@kokizzu
Copy link

kokizzu commented Sep 18, 2020

from @codenoid https://play.golang.org/p/lwO4m6ZVTeQ

LuaJIT 2.1.0-beta3 -- Copyright (C) 2005-2017 Mike Pall. http://luajit.org/
luajit a.lua  0,05s user 0,00s system 98% cpu 0,052 total
go version go1.14.7 linux/amd64
go build a.go && ./a
./a  0,46s user 0,00s system 99% cpu 0,461 total

@iemelyanov
Copy link

Go code can be much faster without an extra check of contains key in the map.

package main

import (
	"fmt"
)

func main() {
	text := []string{"The", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog", "at", "a", "restaurant", "near", "the", "lake", "of", "a", "new", "era"}
	dict := map[string]int{}
	times := 1000001

	for i := 0; i < times; i++ {
		for _, word := range text {
			dict[word]++
		}
	}

	for key, val := range dict {
		fmt.Println(key, val)
	}
}

0.24s user 0.00s system 100% cpu 0.245 total

@mbossX
Copy link

mbossX commented May 11, 2021

code

js

const text = ["The", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog", "at", "a", "restaurant", "near", "the", "lake", "of", "a", "new", "era"];
const map = new Map();
const times = 100000000;

let c = 0;
for (var k = 0; k < text.length; k++) {
    c = 1;
    for (let i = 0; i < times; i++) {
        c++;
    }
    map.set(text[k], c);
}

for (var key of text) {
    console.log(key, map.get(key));
}

lua

-- The lua program needs no optimizations.
-- It just works. Fast.
-- Note that luajit probably optimizes away
-- all dictionary lookups :)

local text = {"The", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog", "at", "a", "restaurant", "near", "the", "lake", "of", "a", "new", "era"}
local map = {}
local times = 100000000
local n = #text

for i = 1, n do
    map[text[i]] = 1
    for _ = 1, times do  
        map[text[i]] = map[text[i]] + 1
    end
end

for i = 1, n do
    io.write(text[i], " ", map[text[i]], "\n")
end

c

#include <stdio.h>
int main() {
	const char *text[] = {"The", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog", "at", "a", "restaurant", "near", "the", "lake", "of", "a", "new", "era"};
	const int textLen = sizeof(text)/sizeof(text[0]);
	int map[textLen];
	int times = 100000000;
	for (int k = 0; k < textLen; k++)	{
		map[k] = 1;
		for (int i=0;i<times;i++) {
			map[k] += 1;
		}
	}
	for (int k = 0; k < textLen; ++k)	printf("%s %d\n", text[k], map[k]);
}

gcc test.c -o test -O3

result

js

real    0m1.118s
user    0m1.118s
sys     0m0.000s

lua

real    0m1.774s
user    0m1.775s
sys     0m0.000s

c

real    0m0.001s
user    0m0.000s
sys     0m0.000s

PS

node -v
v15.12.0
luajit -v
LuaJIT 2.0.5 -- Copyright (C) 2005-2017 Mike Pall. http://luajit.org/
gcc version 8.3.0 (Debian 8.3.0-6)

@thetalmid
Copy link

In summary, languages are all so fast and benchmarking so unpredictable that it is impossible to determine which one to choose for your application based on performance only?

@thetrung
Copy link

thetrung commented Sep 6, 2021

when we can e JIT in LuaJIT, no other reason to not use it for scripting, beside written parts in C/C++ for bridge/performance

@LeandroPerrotta
Copy link

In little words, you need know way too much in C/C++ to get same results you get with regular knowledge in Lua/JS.

The times where C/C++ codes, even bad wrote ones, used to mean they were at least be fast than high level languages. Is gone.

@spion
Copy link
Author

spion commented Feb 14, 2022

The summary is that naive benchmarks give you wrong and unrealistic results. You would expect these to be strings, and you would expect the objects to be regular hash tables, but in reality scripting languages can switch data structures or hashing functions for these depending on the way the code is executed at runtime (i.e. JIT) so your test might not be representative of what would happen in a real program.

Still, its an interesting starter for a conversation on micro-benchmarks and JIT optimizations. I recommend mraleph's blog for a deep dive (mostly from a JS perspective), e.g. https://mrale.ph/blog/2014/02/23/the-black-cat-of-microbenchmarks.html

@thetrung
Copy link

Surely, it's always based on real use cases but playground benchmark like this, is really useful to have. There are alot of things happen behind every compiler optimization that programmers may not know or may not need to know, which hide away alot of complexity in development.

I hope people will keep coming back on this thread with their own benchmarks on every language that we are using today.

Productivity vs. Speed always will be fun to discuss.

@kokizzu
Copy link

kokizzu commented Feb 17, 2022

that last C code is cheating btw, it's comparing pointer to array of char, not comparing the array of char (string) itself
so it wouldn't give same result if the string is from user input / not constants.

@dlannan
Copy link

dlannan commented May 30, 2022

The last C code isnt doing a hash lookup. Its a plain array. Which if you read the original code, they were all doing hash map lookups.
This is something luajit does very well because its builtin and the jit optimises the hash lookups at runtime. As others have mentioned, you need to write platform specific code in C to be able to achieve this - and its alot of work (essentially what the lua jit does).
In commercial work, I can vouch for luajit. I developed software for realtime guaranteed packet capture systems (fmad.io) running luajit - and in many cases luajit was easily the best fit for purpose in performance and ease of implementation. We often worked on many GB data sets and with some big network pipes that we needed to run guaranteed packet capture on.
The biggest benefit I found was using luajit as a high speed binder for Clibs.. with zero overhead with ffi calls to c it meant we could utilise C libraries, as well as still use luajit for building the app side. The interesting thing, is how much luajit is used in the network capture world.

@FrankHB
Copy link

FrankHB commented Sep 6, 2023

As a common sense of such micro benchmarks, the init (and deinit) time counts. For example, in hosted implementation of C and C++ on x86_64 Linux, typically the C runtime will have an overhead of less than 100k of insns and C++ will have 3M. This still does not indicate that which one is definitely more efficint though: both can have freestanding implementations with less dynamic initialization required (e.g. getting rid of the iostream initialization mandated by the standard on a C++ hosted implementation). For typical manner of hosted implementations, if the kernel mode runtime/loader is implemented in C++, processes in userland can even share with pre-initized environments with no overhead at all. That said, why benchmarks of userland programs should rely on such stuff? Because this really makes differences when your benchmark runs sufficiently fast.

OTOH, there is nothing comparable on Lua or JS. There are no specs on what exactly mean to implement freestanding environments for them. The runtime initialization do incur some non-deterministic overhead which typically in turn rely on the native init overhead. Then for userland programs, you will compare "impure" implementations: Lua+C vs. JS+C++ instead of Lua vs. JS, for sure. This will not change before you can implement a hosted environment runtime (e.g. the OS kernel and loader) in your language being compared. Well, in this sense, Linux C++ is actually implemented in C (Linux kernel + e.g. glibc ld.so) + C++, but at least there are standard-conforming ways to replace non-C++ parts as long as you can replace Linux and libc, in reality. And where is the native loader implemented in JS/Lua?

I'm a bit surprised that nobody has pointed out this...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment