Notes on writing a Lua Bytecode VM. Lua is a compact, minimal language designed for embedding within a larger program to provide end-user customization of program behavior. This note outlines how I would breakdown implmementing the Lua Bytecode VM in Rust. The techniques are broadly applicable to any implementation language.
I would proceed by supporting a subset of Lua that uses only numbers then move on to tables with numbers. Lua 5.3 adds integers.
Form small tests cases, compile chunks of lua code and disassemble them with ChunkSpy
It would be nice to have parsing and loading the lua chunks be distinct lib. The nom library in Rust was designed for parsing binary files, it might be a good start, or by hand.
I had orginally thought about this problem in the context of a multiversion VM where 5.1, 5.2 and 5.3 bytecode could all interop. The bytecode format between .point releases of Lua is not compatible.
Most of the work will be in implementing the libs, not the VM itself. Various tooling will be have to collected and upgraded as needed.
- Lua 5.1
- Lua 5.2
- Lua Asm/Disasm
- Chunkspy
- Chunkbake
- LuaDec
- Moonshine libs
In a file t0.lua
local a = 1
local b = 2
c = a + b
compile it to t0.out
luac-5.1 -o t0.out t0.lua
results in hexdump -C t0.out
00000000 1b 4c 75 61 51 00 01 04 08 04 08 00 08 00 00 00 |.LuaQ...........|
00000010 00 00 00 00 40 74 30 2e 6c 75 61 00 00 00 00 00 |....@t0.lua.....|
00000020 00 00 00 00 00 00 02 04 08 00 00 00 01 00 00 00 |................|
00000030 41 40 00 00 8c 40 00 00 87 80 00 00 85 c0 00 00 |A@...@..........|
00000040 c5 80 00 00 9c 40 00 01 1e 00 80 00 04 00 00 00 |.....@..........|
00000050 03 00 00 00 00 00 00 f0 3f 03 00 00 00 00 00 00 |........?.......|
00000060 00 40 04 02 00 00 00 00 00 00 00 63 00 04 06 00 |.@.........c....|
00000070 00 00 00 00 00 00 70 72 69 6e 74 00 00 00 00 00 |......print.....|
00000080 08 00 00 00 01 00 00 00 02 00 00 00 03 00 00 00 |................|
00000090 03 00 00 00 04 00 00 00 04 00 00 00 04 00 00 00 |................|
000000a0 04 00 00 00 02 00 00 00 02 00 00 00 00 00 00 00 |................|
000000b0 61 00 01 00 00 00 07 00 00 00 02 00 00 00 00 00 |a...............|
000000c0 00 00 62 00 02 00 00 00 07 00 00 00 00 00 00 00 |..b.............|
And then processed with lua5.1 ChunkSpy.lua --auto t0.out -o t0.lst
Pos Hex Data Description or Code
0000 ** source chunk: t0.out
** global header start **
0000 1B4C7561 header signature: "\27Lua"
0004 51 version (major:minor hex digits)
0005 00 format (0=official)
0006 01 endianness (1=little endian)
0007 04 size of int (bytes)
0008 08 size of size_t (bytes)
0009 04 size of Instruction (bytes)
000A 08 size of number (bytes)
000B 00 integral (1=integral)
* number type: double
* x86 64 (64-bit, little endian, doubles)
** global header end **
000C ** function [0] definition (level 1)
** start of function **
000C 0800000000000000 string size (8)
0014 4074302E6C756100 "@t0.lua\0"
source name: @t0.lua
001C 00000000 line defined (0)
0020 00000000 last line defined (0)
0024 00 nups (0)
0025 00 numparams (0)
0026 02 is_vararg (2)
0027 04 maxstacksize (4)
* code:
0028 08000000 sizecode (8)
002C 01000000 [1] loadk 0 0 ; 1
0030 41400000 [2] loadk 1 1 ; 2
0034 8C400000 [3] add 2 0 1
0038 87800000 [4] setglobal 2 2 ; c
003C 85C00000 [5] getglobal 2 3 ; print
0040 C5800000 [6] getglobal 3 2 ; c
0044 9C400001 [7] call 2 2 1
0048 1E008000 [8] return 0 1
* constants:
004C 04000000 sizek (4)
0050 03 const type 3
0051 000000000000F03F const [0]: (1)
0059 03 const type 3
005A 0000000000000040 const [1]: (2)
0062 04 const type 4
0063 0200000000000000 string size (2)
006B 6300 "c\0"
const [2]: "c"
006D 04 const type 4
006E 0600000000000000 string size (6)
0076 7072696E7400 "print\0"
const [3]: "print"
* functions:
007C 00000000 sizep (0)
* lines:
0080 08000000 sizelineinfo (8)
[pc] (line)
0084 01000000 [1] (1)
0088 02000000 [2] (2)
008C 03000000 [3] (3)
0090 03000000 [4] (3)
0094 04000000 [5] (4)
0098 04000000 [6] (4)
009C 04000000 [7] (4)
00A0 04000000 [8] (4)
* locals:
00A4 02000000 sizelocvars (2)
00A8 0200000000000000 string size (2)
00B0 6100 "a\0"
local [0]: a
00B2 01000000 startpc (1)
00B6 07000000 endpc (7)
00BA 0200000000000000 string size (2)
00C2 6200 "b\0"
local [1]: b
00C4 02000000 startpc (2)
00C8 07000000 endpc (7)
* upvalues:
00CC 00000000 sizeupvalues (0)
** end of function **
00D0 ** end of chunk **