Skip to content

Instantly share code, notes, and snippets.

@seanjensengrey
Created November 16, 2016 17:22
Show Gist options
  • Save seanjensengrey/e198380afc64f0eb17a47512b48f040f to your computer and use it in GitHub Desktop.
Save seanjensengrey/e198380afc64f0eb17a47512b48f040f to your computer and use it in GitHub Desktop.

Notes on writing a Lua Bytecode VM. Lua is a compact, minimal language designed for embedding within a larger program to provide end-user customization of program behavior. This note outlines how I would breakdown implmementing the Lua Bytecode VM in Rust. The techniques are broadly applicable to any implementation language.

I would proceed by supporting a subset of Lua that uses only numbers then move on to tables with numbers. Lua 5.3 adds integers.

Form small tests cases, compile chunks of lua code and disassemble them with ChunkSpy

It would be nice to have parsing and loading the lua chunks be distinct lib. The nom library in Rust was designed for parsing binary files, it might be a good start, or by hand.

I had orginally thought about this problem in the context of a multiversion VM where 5.1, 5.2 and 5.3 bytecode could all interop. The bytecode format between .point releases of Lua is not compatible.

Most of the work will be in implementing the libs, not the VM itself. Various tooling will be have to collected and upgraded as needed.

Lua Bytecode VM Notes

Opcode Definition in Src

Tooling

Rust Libs

Other Lua Bytecode VMs

Sample Chunkspy Listing

In a file t0.lua

local a = 1
local b = 2
c = a + b
print(c)

compile it to t0.out

luac-5.1 -o t0.out t0.lua

results in hexdump -C t0.out

00000000  1b 4c 75 61 51 00 01 04  08 04 08 00 08 00 00 00  |.LuaQ...........|
00000010  00 00 00 00 40 74 30 2e  6c 75 61 00 00 00 00 00  |....@t0.lua.....|
00000020  00 00 00 00 00 00 02 04  08 00 00 00 01 00 00 00  |................|
00000030  41 40 00 00 8c 40 00 00  87 80 00 00 85 c0 00 00  |A@...@..........|
00000040  c5 80 00 00 9c 40 00 01  1e 00 80 00 04 00 00 00  |.....@..........|
00000050  03 00 00 00 00 00 00 f0  3f 03 00 00 00 00 00 00  |........?.......|
00000060  00 40 04 02 00 00 00 00  00 00 00 63 00 04 06 00  |.@.........c....|
00000070  00 00 00 00 00 00 70 72  69 6e 74 00 00 00 00 00  |......print.....|
00000080  08 00 00 00 01 00 00 00  02 00 00 00 03 00 00 00  |................|
00000090  03 00 00 00 04 00 00 00  04 00 00 00 04 00 00 00  |................|
000000a0  04 00 00 00 02 00 00 00  02 00 00 00 00 00 00 00  |................|
000000b0  61 00 01 00 00 00 07 00  00 00 02 00 00 00 00 00  |a...............|
000000c0  00 00 62 00 02 00 00 00  07 00 00 00 00 00 00 00  |..b.............|
000000d0

And then processed with lua5.1 ChunkSpy.lua --auto t0.out -o t0.lst

Pos   Hex Data           Description or Code
------------------------------------------------------------------------
0000                     ** source chunk: t0.out
                         ** global header start **
0000  1B4C7561           header signature: "\27Lua"
0004  51                 version (major:minor hex digits)
0005  00                 format (0=official)
0006  01                 endianness (1=little endian)
0007  04                 size of int (bytes)
0008  08                 size of size_t (bytes)
0009  04                 size of Instruction (bytes)
000A  08                 size of number (bytes)
000B  00                 integral (1=integral)
                         * number type: double
                         * x86 64 (64-bit, little endian, doubles)
                         ** global header end **
                         
000C                     ** function [0] definition (level 1)
                         ** start of function **
000C  0800000000000000   string size (8)
0014  4074302E6C756100   "@t0.lua\0"
                         source name: @t0.lua
001C  00000000           line defined (0)
0020  00000000           last line defined (0)
0024  00                 nups (0)
0025  00                 numparams (0)
0026  02                 is_vararg (2)
0027  04                 maxstacksize (4)
                         * code:
0028  08000000           sizecode (8)
002C  01000000           [1] loadk      0   0        ; 1
0030  41400000           [2] loadk      1   1        ; 2
0034  8C400000           [3] add        2   0   1  
0038  87800000           [4] setglobal  2   2        ; c
003C  85C00000           [5] getglobal  2   3        ; print
0040  C5800000           [6] getglobal  3   2        ; c
0044  9C400001           [7] call       2   2   1  
0048  1E008000           [8] return     0   1      
                         * constants:
004C  04000000           sizek (4)
0050  03                 const type 3
0051  000000000000F03F   const [0]: (1)
0059  03                 const type 3
005A  0000000000000040   const [1]: (2)
0062  04                 const type 4
0063  0200000000000000   string size (2)
006B  6300               "c\0"
                         const [2]: "c"
006D  04                 const type 4
006E  0600000000000000   string size (6)
0076  7072696E7400       "print\0"
                         const [3]: "print"
                         * functions:
007C  00000000           sizep (0)
                         * lines:
0080  08000000           sizelineinfo (8)
                         [pc] (line)
0084  01000000           [1] (1)
0088  02000000           [2] (2)
008C  03000000           [3] (3)
0090  03000000           [4] (3)
0094  04000000           [5] (4)
0098  04000000           [6] (4)
009C  04000000           [7] (4)
00A0  04000000           [8] (4)
                         * locals:
00A4  02000000           sizelocvars (2)
00A8  0200000000000000   string size (2)
00B0  6100               "a\0"
                         local [0]: a
00B2  01000000             startpc (1)
00B6  07000000             endpc   (7)
00BA  0200000000000000   string size (2)
00C2  6200               "b\0"
                         local [1]: b
00C4  02000000             startpc (2)
00C8  07000000             endpc   (7)
                         * upvalues:
00CC  00000000           sizeupvalues (0)
                         ** end of function **

00D0                     ** end of chunk **
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment