Skip to content

Instantly share code, notes, and snippets.

@muhmuhten
Last active December 9, 2018 05:47
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save muhmuhten/f469d25d375e4e742b74c7a938c58ac8 to your computer and use it in GitHub Desktop.
Save muhmuhten/f469d25d375e4e742b74c7a938c58ac8 to your computer and use it in GitHub Desktop.

bcstrict

Runtime static global checker for Lua 5.3.

This sounds a bit like a oxymoron, but to be specific:

  • bcstrict checks for accesses to unexpected globals within a chunk without executing it, by inspecting its bytecode.

  • bcstrict is intended to be executed by running Lua code on itself, at startup time, without explicit user(/author) intervention.

If called early, this looks kind of like perl's use strict 'vars'. More so than strict.lua, in any case.

Usage

-- check this file
require "bcstrict"()

-- allow access via _G and nothing else
require "bcstrict"{_G=1}

-- no direct global access at all
require "bcstrict"{}
local _G = _ENV
--[[ .. do things ... ]]

-- opportunistic checking
do
        local ok, strict = pcall(require, "bcstrict")
        if ok then strict() end
end

-- check some other chunk
local bcstrict = require "bcstrict"
local chunk = assert(loadfile "other.lua")
bcstrict(_ENV, chunk)

-- prevent usage anywhere else
package.loaded.bcstrict = function () end

Compatibility

The techniques used by bcstrict are generally applicable and should not be hard to port to 5.4. Earlier versions would require a replacement for table.pack and the bytecode-relevant platform details (endianness, integer sizes, &c.) it encodes. I've written something similar for LuaJIT before, though.

As far as I know, the representation of precompiled chunks is guaranteed not to change within a Lua version (x.y, e.g. 5.3) and always breaks between versions. As such, the inclusion of magic opcode and format string constants shouldn't lead to incompatibilities in the no foreseeable future releases of 5.3.

In any case, almost all Lua 5.y code compiles under 5.3 (though semantics might differ), and 5.4 is not expected to add syntax-level incompatibilities, so bcstrict will still be fully usable as a static analyzer ... which almost totally misses the point, though.

Limitations

You must call the function returned by require "bcstrict"! Since require avoids loading a module more than once, but there may be multiple files which need to be checked, each user of bcstrict has to actually run it.

Due to the design constraint of being implemented by parsing dumped bytecode, bcstrict has a slightly interesting concept of a global access: a get or set to a field of an upvalue which is, or can be traced up to, the first (and only!) upvalue of a chunk is forbidden if they key used does not exist the environment provided (or _ENV) when bcstrict is called.

However, it doesn't track any other variables. In particular, it won't catch "globals" that access a declared local _ENV, and it will complain when you use fields of _ENV explicitly, e.g.:

-- OK
require "bcstrict"()
local _ENV = _ENV
print(not_defined)

-- not OK
require "bcstrict"()
print(_ENV.not_defined)

In addition, bcstrict does nothing useful when called on functions which are not chunks. This is because it is fundamentally impossible to figure out which, if any, of a function's upvalues contains _ENV. For example, all of the inner functions returned by the following snippets have identical code, but close over different variables.

local a
function f(b)
        return function ()
                a.c = d + b.e
        end
end

local a
function g(b)
        return function ()
                c = a.d + b.e
        end
end

local a, b
function h(_ENV)
        return function ()
                a.c = b.d + e
        end
end

Debug information could be used to identify _ENV, if available; however, as the last example shows, it will also flag a locally redefined _ENV. Whether or not this is desirable is arguable: redefinitions of _ENV usually come with specific intentionality which makes general global checking pointless anyway.

why this

Lua's default behavior of silently accepting access to undefined (misspelled, out-of-scope, &c.) variables is hilariously error-prone and literally my #1 source of bugs while writing this damn module. There are three or so well-known ways of combatting this issue:

  • Careful testing. Look, if it works for you...

  • Set a metatable on the global environment table. Often good enough. Has overhead and side-effects which may make it unsuitable for libraries. Won't catch errors on code paths you didn't test with it on.

  • Some sort of static analyzer. Probably luacheck. This works pretty well ... if you run it.

I like static analysis. Like diet and exercise, I don't do it nearly enough of it due to a confluence of minor nuisances.

This is an attempt to capture most of its benefits with a lot less overhead.

local unpack = string.unpack
-- ldump.c:73:DumpString. Three formats.
-- * (char)0. No string. Occurs when debug info is missing, e.g. stripped dump
-- or nameless locals.
-- * (char)255, (size_t)size, char[size-1]. Used for strings of at least 254
-- characters, where the size (including trailing 0) won't fit in a byte.
-- * (char)size, char[size-1]. Strings of at most 253 characters.
-- The encoded size includes space for a trailing 0 which isn't actually in the
-- dump, so none of these unpack cleanly with the 's' format either...
local function parse_string (s, y)
local len, x = unpack("B", s, y)
if len == 0 then
return nil, x
elseif len == 255 then
len, x = unpack("T", s, x)
end
return unpack("c"..(len-1), s, x)
end
-- ldump.c:90:DumpCode. (int)sizecode, Instruction[sizecode].
-- Instruction is a typedef for an unsigned integer (int or long) with at least
-- 32 bits; this is almost certainly 4 bytes, but theoretically doesn't have to
-- be, so we pass the format in as an argument.
-- lopcodes.h:13. On the 5.3 VM, instructions are 32-bit integers packing
-- opcode:6, A:8, C:9, B:9 bits. (Yes, C is between A and B...)
-- lopcodes.h:178:OP_GETTABUP,/* A B C R(A) := UpValue[B][RK(C)]
-- lopcodes.h:181:OP_SETTABUP,/* A B C UpValue[A][RK(B)] := RK(C)
-- A global access compiles down to a table access to the upvalue holding the
-- closed-over value of _ENV. Unfortunately, at this point, we don't actually
-- know which upvalue (if any!) is _ENV, so we have to mark down every upvalue
-- table access as suspicious.
-- Returns a sequence of {upvalue, instruction index, is write, table index}
-- tuples; of these, only the upvalue is strictly necessary:
-- * instruction index is used to look line numbers up from debug info
-- * table index can be looked up in the constants table for the name accessed
local function parse_code (s, x, ins_fmt)
local OP_GETTABUP, OP_SETTABUP = 6, 8
local d, v = {}
v, x = unpack("i", s, x)
for j=1,v do
v, x = unpack(ins_fmt, s, x)
local o, b = v & 63, v>>23 & 511
if o == OP_GETTABUP then
d[#d+1] = {b, j, false, v>>14 & 511}
elseif o == OP_SETTABUP then
d[#d+1] = {v>>6 & 255, j, true, b}
end
end
return d, x
end
-- ldump.c:98:DumpConstants. (int)sizek, Various[sizek].
-- This is a nasty format whose size can't be computed without parsing.
-- "Various" comprises five formats of note:
-- * (char)LUA_TNIL==0.
-- * (char)LUA_TBOOLEAN==1, char.
-- * (char)LUA_TNUMFLT==3, lua_Number.
-- * (char)LUA_TNUMINT==19, lua_Integer.
-- * (char)LUA_TSHRSTR==4 or LUA_TLNGSTR==20, DumpString.
-- Only string constants *really* matter, since those are generated by "real"
-- global accesses; the others only occur on false-positives generated by
-- directly indexing _ENV. Of course, those will generate misleading reports.
local function parse_constants (s, x)
local k, v = {}
v, x = unpack("i", s, x)
for j=1,v do
v, x = unpack("B", s, x)
if v == 0 then -- nil
v = "nil"
elseif v == 1 then -- boolean
v, x = unpack("B", s, x)
v = tostring(v ~= 0)
elseif v == 3 then -- number (numflt)
v, x = unpack("n", s, x)
elseif v == 19 then -- number (numint)
v, x = unpack("j", s, x)
elseif v == 4 or v == 20 then -- string (shrstr/lngstr)
v, x = parse_string(s, x)
else
assert(false, "bad ttype "..v.." at byte "..x)
end
k[j] = v
end
return k, x
end
-- ldump.c:137:DumpUpvalues.
-- (int)sizeupvalues, {(char)instack, (char)idx}[sizeupvalues].
-- Every upvalue corresponds to either either (instack==1) a local variable or
-- (instack==0) an upvalue of the enclosing function. For the main function of
-- a chunk, _ENV is set to (1,0): the first local of the fictional enclosing
-- scope. _ENV is *never* on the stack in any other case, since the since all
-- function definitions lie under the chunk main.
-- Then, given the (instack, idx) tuple which identifies _ENV in the upvalues
-- of this function, we can find its upvalue index, s.t. (0, upvalue index)
-- identifies _ENV in the upvalues of this function's immediate children.
-- Not every function must have _ENV as an upvalue, but it must be present to
-- be passed down to descendants.
local function parse_upvalues (s, x, env_index)
local v, z
v, x = unpack("i", s, x)
z = x + 2*v
if not env_index then
return nil, z
end
for j=1,v do
-- Read (instack, idx) as (instack<<8)+idx.
-- We're looking for either (1,0)=256 or some (0,idx)=idx.
v, x = unpack(">i2", s, x)
if v == env_index then
return j-1, z
end
end
return nil, z
end
-- ldump.c:147:DumpDebug.
-- (int)sizelineinfo, (int[sizelineinfo])lineinfo,
-- (int)sizelocvars, locvars, (int)sizeupvalues, upvalues.
-- This section is totally zeroed out for stripped dumps.
-- Line numbers are useful to report if available.
local function parse_debug (s, x)
local lineinfo, v = {}
v, x = unpack("i", s, x)
for j=1,v do
lineinfo[j], x = unpack("i", s, x)
end
v, x = unpack("i", s, x)
for _=1,v do
_, x = parse_string(s, x)
_, _, x = unpack("ii", s, x)
end
v, x = unpack("i", s, x)
for _=1,v do
_, x = parse_string(s, x)
end
return lineinfo, x
end
-- ldump.c:166:DumpFunction.
-- [DumpString]source, (int)linedefined, (int)lastlinedefined,
-- (char)numparams, (char)is_vararg, (char)maxstacksize,
-- DumpCode, DumpConstants, DumpUpvalues, DumpProtos, DumpDebug.
local function parse_function (cb, s, x, ins_fmt, env_index, parent)
local source, linedefined, lastlinedefined
source, x = parse_string(s, x)
source = source or parent
linedefined, lastlinedefined, x = unpack("iixxx", s, x)
if linedefined == 0 then
-- (char)instack==1, (char)idx==0. See parse_upvalues.
env_index = 256
end
local candidates, constants
candidates, x = parse_code(s, x, ins_fmt)
constants, x = parse_constants(s, x)
env_index, x = parse_upvalues(s, x, env_index)
local nprotos
nprotos, x = unpack("i", s, x)
for _=1,nprotos do
x = parse_function(cb, s, x, ins_fmt, env_index, source)
end
local lineinfo
lineinfo, x = parse_debug(s, x)
if env_index then
for j=1,#candidates do
local a = candidates[j]
if a[1] == env_index then
local line = lineinfo[a[2]]
if line then
-- leave it
elseif linedefined == 0 then
line = "main"
else
line = linedefined .. "-" .. lastlinedefined
end
local name = constants[a[4]-255] or "(not constant)"
cb(name, a[3], source or "=stripped", line)
end
end
end
return x
end
-- ldump.c:184:DumpHeader.
-- "\x1bLua"[:4], (char)LUAC_VERSION==0x53, (char)LUAC_FORMAT==0,
-- LUAC_DATA=="\x19\x93\r\n\x1a\n"[:6],
-- (char)sizeof(int), (char)sizeof(size_t), (char)sizeof(Instruction),
-- (char)sizeof(lua_Integer), (char)sizeof(lua_Number),
-- (lua_Integer)LUAC_INT==0x5678, (lua_Number)LUAC_NUM==370.5.
-- Additionally, skip an extra byte: ldump.c:211. (char)sizeupvalues.
local function parse_header (s)
local sig, ver, fmt, lit, isz, int, num, x = unpack("c4BBc6xxBxxjnx", s)
assert(sig == "\x1bLua", "not a dump")
assert(ver == 0x53 and fmt == 0, "not a standard 5.3 dump")
assert(lit == "\x19\x93\r\n\x1a\n", "mangled dump (conversions?)")
assert(int == 0x5678, "mangled dump (wrong-endian?)")
assert(num == 370.5, "mangled dump (floats broken?)")
return "I"..isz, x
end
local function check_dump (s, cb)
local ins_fmt, x = parse_header(s)
return parse_function(cb, s, x, ins_fmt)
end
local function strict_mode (env, fun)
if not fun then
fun = string.dump(debug.getinfo(2, "f").func)
elseif type(fun) == "function" then
fun = string.dump(fun)
end
env = env or _ENV
local accum = {}
check_dump(fun, function (key, is_write, source, line)
if not env[key] then
source = source:sub(2)
local action = is_write and "write: " or "read: "
accum[#accum+1] = source..":"..line..": global "..action..key
end
end)
if #accum > 0 then
accum[0] = "unexpected globals"
error(table.concat(accum, "\n\t", 0), 2)
end
end
strict_mode()
return strict_mode
#!/usr/bin/env lua53
local bcstrict = require "bcstrict"
bcstrict()
for j=1,#arg do
bcstrict( nil, assert(loadfile(arg[j])))
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment