Skip to content

Instantly share code, notes, and snippets.

@fwsGonzo

fwsGonzo/benchmarks.txt Secret

Last active Sep 22, 2020
Embed
What would you like to do?
Benchmarks from script_bench repo, measuring RISC-V emulator performance
RISC-V self-test OK
libriscv: fork median 285ns lowest: 273ns highest: 328ns
libriscv: install syscall median 5ns lowest: 5ns highest: 14ns
libriscv: function call median 3ns lowest: 3ns highest: 3ns
luajit: function call median 90ns lowest: 89ns highest: 107ns
native: array append median 0ns lowest: 0ns highest: 0ns
libriscv: array append median 20ns lowest: 19ns highest: 45ns
libriscv: array app. direct median 13ns lowest: 13ns highest: 38ns
luajit: table append median 127ns lowest: 127ns highest: 157ns
libriscv: many arguments median 128ns lowest: 127ns highest: 171ns
luajit: many arguments median 442ns lowest: 431ns highest: 516ns
libriscv: integer math median 20ns lowest: 20ns highest: 44ns
libriscv: fp math median 35ns lowest: 33ns highest: 93ns
libriscv: exp math median 32ns lowest: 31ns highest: 49ns
libriscv: fib(40) median 548ns lowest: 523ns highest: 593ns
luajit: integer math median 127ns lowest: 122ns highest: 162ns
luajit: fp math median 142ns lowest: 140ns highest: 187ns
luajit: exp math median 217ns lowest: 217ns highest: 259ns
luajit: fib(40) median 157ns lowest: 155ns highest: 184ns
libriscv: syscall overhead median 9ns lowest: 8ns highest: 26ns
libriscv: syscall print median 44ns lowest: 43ns highest: 54ns
luajit: syscall overhead median 110ns lowest: 109ns highest: 154ns
luajit: syscall print median 169ns lowest: 169ns highest: 213ns
libriscv: complex syscall median 138ns lowest: 137ns highest: 164ns
luajit: complex syscall median 1048ns lowest: 1021ns highest: 1120ns
libriscv: micro threads median 149ns lowest: 146ns highest: 193ns
luajit: coroutines median 373ns lowest: 361ns highest: 441ns
libriscv: micro thread args median 182ns lowest: 177ns highest: 229ns
libriscv: full thread args median 276ns lowest: 275ns highest: 318ns
luajit: coroutine args median 424ns lowest: 411ns highest: 481ns
luajit: coroutine args median 425ns lowest: 413ns highest: 475ns
libriscv: naive memcpy median 471ns lowest: 457ns highest: 544ns
libriscv: syscall memcpy median 39ns lowest: 39ns highest: 58ns
luajit: memcpy median 248ns lowest: 242ns highest: 326ns
@fwsGonzo

This comment has been minimized.

Copy link
Owner Author

@fwsGonzo fwsGonzo commented May 8, 2020

Recently updated. By using naked functions and returning using macros we can get rid of function prologues and epilogues completely for the entry function itself. You don't have to do this yourself as it's completely optional, per-function, but it's another tool in the toolbox for functions that need to be faster.

@fwsGonzo

This comment has been minimized.

Copy link
Owner Author

@fwsGonzo fwsGonzo commented Jun 15, 2020

Added several floating-point tests, although they are a bit short. Eyeballing the results it doesn't look like LuaJIT is going to overtake in the long run as long as unaccelerated operations are handled natively through system calls.

@fwsGonzo

This comment has been minimized.

Copy link
Owner Author

@fwsGonzo fwsGonzo commented Jul 31, 2020

Updated results (which regresses direct threads due to loss of jump traps, temporarily I hope). Other results are better in return, however the focus isn't any longer on this synthetic benchmark, but rather on massive scalability in a highly concurrent setting.

@fwsGonzo

This comment has been minimized.

Copy link
Owner Author

@fwsGonzo fwsGonzo commented Sep 22, 2020

Jump traps are back using a fallback mechanism, and the decoder is now faster than ever. fib(40) shows how emulated can never win against jit but it's not completely hopeless: only 3x slower. While the naive memcpy is only half as fast as LuaJIT memcpy, it's also crazy good. System calls now have smaller overhead than before and the treshold for using them is quite small, and the benefits huge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment