Background and Objectives
The NodeMCU firmware is currently based on the Lua 5.1.4 core with the
eLua patch and other NodeMCU specific enhancements and optimisations ("Lua51"). This paper discusses the rebaselining of NodeMCU to the latest production Lua version 5.3.5 ("Lua53"). Our goals in this upgrade were:
NodeMCU should offer a current Lua version, 5.3.5 that is as functionally complete as practical.
Lua53 will adopt a minimum change strategy against the standard Lua source code base, that is changes to the VM and runtime system will only be made where there is a compelling reasons for any change, for example Lua53 preserves some valuable NodeMCU enhancements, for example the addition of Lua VM support for constant Lua program and data being executable directly from Flash ROM in order to free up RAM for application use to mitigate the RAM limitations of ESP-class IoT devices
NodeMCU will provide a clear and stable migration path for both existing hardware libraries and ESP Lua applications being migrated from Lua51 to Lua53.
The Lua53 implementation will provide a common code base for ESP8266 and ESP32 architectures. (The current Lua51 implementation was historically been forked with variant code bases for the two architectures.)
- Background and Objectives
- Specific Design Decisions
- Detailed Implementation Notes
- Build variants and includes
- TString types and implementation
- Proto Structures
- Locale support
- Memory Optimisations
- Unaligned exception avoidance
- Modulus and division operation avoidance
- Key cache
- Flash image generation and loading
- Garbage collection
- Panic Handling
- API Compatibility for NodeMCU modules
- Lua Language and Libary Compatibility for NodeMCU Lua modules
- Other Implementation Notes
- Detailed changes from standard Lua 5.3 core for NodeMCU Lua53
- Standard C API and Interface Changes
Specific Design Decisions
The NodeMCU C module API is built on the standard Lua C API that is common across the Lua51 and Lua53 build environments but again with limited changes needs to reflect our IoT changes. Note that Lua 5.3 introduced some core functional and C API changes; however, the use the standard Lua 5.3 compatibility modes largely hides these changes, though modules may optionally make use of the
LUA_VERSION_NUMdefine should version-specific code variants be needed.
The historic NodeMCU C module API (following
eLuaprecedent and) added some extensions that somewhat compromised the orthogonal design principles of the standard Lua API; these are that modules should only access the Lua runtime via the
lua_XXXmacros and and calls exported through the
lua.hheader (or the wrapped helper
luaL_XXXXversions exported through the
lauxlib.hheader). Such inconsistencies will be removed from the existing NodeMCU API and modules, so that all modules can be compiled and executed within either the Lua51 or the Lua53 environment.
Lua publishes Reference Manuals (LRM) for the Language specification, core libraries and C APIs. The Lua53 implementation will include a supplemental Reference Manual to document the NodeMCU extensions to the core libraries and C APIs. As these API are unified across Lua51 and 53, this also provides a common reference that can also be used for developing both Lua51 and Lua53 modules.
The two Lua code bases will be maintained within a common Git branch (in parallel NodeMCU sub-directories
app/lua53), with an optional make parameter
LUA=53selecting a build with Lua based on
app/lua53, thus generating a Lua 5.3 firmware image. (At a later stage once Lua53 is proven and stable, we will swap the default to Lua53 and move the Lua51 tree into frozen support.)
Many of the important features of the
eLuachanges to Lua51 used by NodeMCU have now been incorporated into core Lua53 and can continue to be used 'out of the box'. Other NodeMCU LFS, ROTable and LCD functionality has been rewritten for NodeMCU, so the Lua53 code base will no longer uses the
Lua53 will ultimately support three build targets that correspond to the ESP8266, ESP32 and native host targets using a common Lua53 source directory. The ESP build targets generate a firmware image for the corresponding ESP chip class, and the host target generates a host based
luac.crossexecutable. This last can either be built standalone or as a sub-make of the ESP builds.
The Lua53 host build
luac.crossexecutable will continue to extend the standard functionality by adding support for LFS image compilation and include a Lua runtime execution environment that can be invoked with the
-eoption. An optional make target also adds the Lua Test Suite to this environment to enable use of this test suite.
Lua 5.3 introduces the concept of subtypes which are used for Numbers and Functions, and Lua53 follows this model adding a ROTables subtype for Tables.
Lua numbers have separate Integer and Floating point subtypes. There is therefore no advantage in having separate Integer and Floating point build variants. Lua53 therefore ignores the
LUA_NUMBER_INTEGRALbuild option. However it will provide option to use 32 or 64 bit numeric values with floating point numbers being stored as single or double precision respectively as well as the current hybrid where integers are 32-bit and
doublefor floating point. Hence 32-bit integer only applications will have similar memory use and runtime performance as existing Lua51 Integer builds.
Lua tables have separate RWTable and ROTable subtypes. The Lua53 implementation of this subtyping will be backported to Lua51 where these are separate types, so that the table access API is the same for both Lua51 and Lua53.
Lua Functions have separate Lua, C and lightweight C subtypes, with this last being a special case where the C function has no upvals and so it doesn't need an associated
Closurestructure. It is also essentially the same TValue type as our lightweight functions, but there is no need for and explicit API support for it.
Lua 5.3 also introduces another small number of other but significant Lua language and core library / API changes. Our Lua53 implementation does not limit these, though we have enabled the appropriate compatibility modes to limit the impact at a Lua developer level. These are discussed in further detail in the following compatibility sections.
Many standard OS services aren't available on a embedded IoT device, so Lua53 will follow the Lua51 precedent by omitting the
oslibrary for target builds as the relevant functionally is largely replaced by the
Flash based code execution can incur runtime performance impacts if not mitigated. Some small code changes are required as the current GCC toolchain for the Xtensa processors doesn't handle all of these mitigations during code generation. For example, as the hardware only supports aligned word access to flash memory ,'hot' byte constant accesses have been encapsulated to remove non-aligned exceptions at a cost of one extra Xtensa instruction per access; the remaining byte accesses to flash use a software exception handler.
NodeMCU employs a single threaded event loop model (somewhat akin to Node.js), and this is supported by some task extensions to the C API that facilitate use of a callback mechanism.
Detailed Implementation Notes
We follow two broad principles (1) everything other than the Lua source directory is common, and (2) only change the Lua core when there is a compelling reason. The following sub-sections describe these issues / changes in detail.
Build variants and includes
Lua53 supports three build targets which correspond to:
The ESP8266 (and derivative ESP8385) architectures use the Espressif non-OS SDK and its GCC Xtensa tool-chain. The macros
LUA_USE_ESPare defined for this target.
The ESP32 architecture using the Espressif IDF and its GCC Xtensa toolchain. The macros
LUA_USE_ESPare defined for this target.
A host architecture using the standard host C toolchain. The macro
LUA_USE_HOSTis defined for this target. We currently support any POSIX environment that supports the GCC toolchain and Windows builds using native MSVC, WSL, Cygwin and MinGW.
LUA_USE_ESP are in effect mutually exclusive:
LUA_USE_ESP is defined for a target firmware build, and
LUA_USE_HOST for a build of the host-based
luac.cross executable for the host environment. Note that
LUA_USE_HOST also defines
LUA_CROSS_COMPILER as used in Lua51.
Our Lua51 source has been recently migrated to use
newlib conformant headers and C runtime calls. An example of this is that the standard SDK headers supplied by Espressif include
"c_string.h" and use
c_strcmp() as the string comparison function. NodeMCU source files use
Caution: The NodeMCU source is compliant rather than fully conformant with the standard headers such as
<string.h>; that is the current subset of these APIs used by the code will successfully compile, link and execute as an image, but code additions which attempt to use extra functions defined in the APIs might not; this at least minimises the need to change standard source code to compile and run on the ESP8266.
As with Lua51, Lua53 heavily customises the
luac.c files because of the demands of an embedded runtime environment. Given the amount of change, these are stripped of the functionally dead code.
TString types and implementation
The Lua 5.3 makes a significant modification to the treatment of strings, dividing them into two separate subtypes based on the string length (at
LUAI_MAXSHORTLEN, 40 in the current implementation). This decision reflects two empirical observations based on a broad range of practical Lua applications: the long the string, the less likely the application is to recreate it independently; and the cost of ensuring uniqueness increases linearly with the length of the string. Hence Lua53 now treats the string type differently:
Short Strings are stored uniquely using the
ROstrtstring table as discussed below. Two short TStrings are identical if and only if their addresses are the same.
Long Strings are created and copied by reference, but are not guaranteed to be stored uniquely.
Since short strings are stored uniquely, identity comparison is based comparing their TString address. For long TStrings identify comparison is a little more complex:
- They are identical if their addresses are the same
- They are different if their lengths are different.
- Failing these short circuits, a full
memcmp()must be carried out.
Lua GC of both types is essentially the same, excepting that collection of long strings does not need to update the
Note that for real applications, identical long strings are rarely generated by other than by copy-reference and hence in general the runtime savings benefits exceed the small chance of storage duplication. Also note that this and other sub-typing is hidden at the Lua C API level and is handled privately inside the Lua VM implementation.
Whilst running Lua applications make heavy use of TStrings, the Lua VM itself makes little use of TStrings and typically pushes any string literals as CStrings. The Lua53 VM introduced a key cache to avoid the runtime cost of hashing string and doing the
strt lookup for this type of CString constant. The NodeMCU implementation shares this key cache is shared with ROTable field resolution.
Short strings in the standard Lua VM are bound at runtime into the the RAM-based
strt string table. Our LFS implementation adds a second LFS-based readonly
ROstrt string table that is is created during LFS image load and then referenced on subsequent CPU restarts. The Lua VM and C API resolves each new short string first against the
strt, then the
ROstrt string table, before adding any unresolved strings into the
strt. Hence short strings are interned across these two
ROstrt string tables. Thus any runtime reference to strings already in the LFS
ROstrt do not create additional entries in RAM. So applications are free include dummy resource functions (such as
lua_examples/lfs) to preload additional strings into
ROstrt and thus avoid needing RAM for these. Such functions don't need to called; simply inclusion in the LFS build is sufficient.
An LFS section below discusses further implementation details.
The ROTables concept was introduced in
eLua, with the ROTable format designed to be compiled by being declarable with C source and so statically included in the firmware at built-time, rather taking up RAM. This essential functionality has been preserved across both Lua51 and Lua53. At an API level ROTables are handled as a table subtype within the Lua VM except that:
- ROTables are declared statically in C code.
- Only a subset of key and value types is supported for ROTables.
- Attempting to write to a ROTable field will raise an error.
- The C API provides a method to push a ROTable reference direct to the Lua stack, but other than this, the Lua API to read ROTables and Tables is the same.
We have now completely replaced the
eLua implementation for Lua53, with this implementation backported to Lua51. Tables are now declared using
LROT macros with the
LROT_END() macro also generating a
ROTable structure, which is a variant of the standard
Table header and linking to the
luaR_entry vector declared using the various
LROT_XXXXENTRY() macros. This has new implementation has some major advantages:
RWTables and ROTables are separate subtypes of Table, and so only minor code changes are needed within
ltable.cto implement this, with the implementation now effectively hidden from the rest of the runtime and any library modules; this has enabled us to remove most of the ROTable code patches added by
luaR_entryvector is a linear list, so (unlike a standard RAM Table) any ROTable has no associate hash table for fast key lookup. However we have introduced a unified ROTable key cache to provide direct access into ROTable entries with a typical hit rate over 99% (key cache misses still require a linear key scan), and so average ROTable access is only slightly slower than RAM Table access, unlike the
eLuaimplementation which was extremely slow.
ROTablestructure variants are not GC collectable and so the
nextfield is set to the marker constant
((GCObject *) 1), and (since ROTables can only refer to other RO objects) this allows the Lua GC to short-circuit GC sweeps across such RO nodes. The
ROTablestructure variant also drops unused fields to save space, and again this is handled internally within
As all tables have a header record that includes a valid
fasttm()optimisations now work for both
The same Lua51 ROTables functionality and limitations also apply to Lua53 in order to minimise migration impact for C module libraries:
ROTables can only have string keys and a limited set of Lua value types (Numeric, Light CFunc, Light UserData, ROTable, string and Nil). In Lua 5.3
Floatare now separate numeric subtypes, so
LROT_INTENTRY()takes an integer value. The new
LROT_FLOATENTRY()is used for a non-integer values. This isn't a migration issue as none of the modules use floating point constants in ROTables declared in our modules; there is the only one currently used is in
- For 5.1 builds,
LROT_FLOATENTRY()is a synonym of
- For 5.3 builds,
LROT_NUMENTRY()is a synonym of
- For 5.1 builds,
Some ordering limitations apply:
luaR_entryvectors can be unordered except for any metafields: Any entry with a key name starting in "_" must must be ordered and placed at the start of the vector.
LROT_END()take the same three parameters. (These are ignored in the case of the
LROT_BEGIN()macro, but by convention these are the same to facilitate the begin / end pairing).
- The first field is the table name.
- The second field is used to reference the ROTable's metatable (or
NULLif it doesn't have one).
- The third field is 0 unless the table is a metatable, in which case it is a bit mask used to define the
flagsfield. This must match any metafield entries for metafield lookup to work correctly.
Standard Lua 5.3 contains a new peep hole optimisation relating to closures: the Proto structure now contains one RW field pointing to the last closure created, and the GC adopts a lazy approach to recovering these closures. When a new closure is created, if the old one exists and the upvals are the same then it is reused instead of creating a new one. This allows peephole optimisation of a usecase where a function closure is embedded in a do loop, so the higher cost closure creation is done once rather than
This reduces runtime at the cost of RAM overhead. However for RAM limited IoTs this change introduced two major issues: first, LFS relies on Protos being read-only and this RW
cache field breaks this assumption; second closures can now exist past their lifetime, and this delays their GC. Memory constrained NodeMCU applications rely on the fact that dead closed upvals can be GCed once the closure is complete. This optimisation changes this behaviour. Not good.
Lua53 removes this optimisation for all prototypes.
Standard Lua 5.3 introduces localisation support. NodeMCU Lua53 disables this because IoT implementation doesn't have the appropriate OS support.
Various Lua structures have
double fields which are align(8) by default. There is no reason or performance benefit for doing align(8) on ESPs so all Lua code is compiled with the
Lua53 also reimplements the Lua51 LCD (Lua Compact Debug) patch. This replaces the
ìnt vector giving line info with a packed byte array that is typically 15-30× smaller. See the LCD whitepaper for more information on this algo.
Unaligned exception avoidance
By default the GCC compiler emits a
l8ui instruction to access byte fields on the ESP8266 and ESP32 Xtensa processors. This instruction will generate an unaligned fetch exception when this byte field is in Flash memory (as will accessing short fields). These exceptions are handled by emulating the instruction in software using an unaligned access handler; this allows execution to continue albeit with the runtime cost of handling the exception in software. We wish to avoid the performance hit of executing this handler for such exceptions.
lobject.h defines a new
GET_BYTE_FN(name,t,wo,bo) macro. In the case of host targets this macro generates the normal field access, but in the case of Xtensa targets these macros define an
static inline access function for each field. Use of these functions at the default
-O2 optimisation level cause the code generator to emit a pair of
extui instructions replacing the single
l8ui instruction. This has the cost of an extra instruction execution for accessing RAM data, but also removes the 200+ clock overhead of the software exception handler in the case of flash memory accesses.
There are 9 byte fields in the
ROTable structures that can either be statically compiled as
const struct into libraries or generated by the lua cros compiler into the LFS region, and the
GET_BYTE_FN macro has been used to create access macros for these fields, and read references of the form
(o)->tt (for example) have been recoded using the access macro form
gettt(o). There are 44 such changed access references in the source which together represent perhaps 99% of potential sources of this software exception within the Lua VM.
The access macro hasn't been used where access is guarded by a conditional that implies the field in a RAM structure and therefore the
l8ui instruction is executed correctly in hardware. Another exclusion is in modules such as
lcode.c which are only used in compilation, and where the addition runtime penalty is acceptable.
A wider review of
const char initialisers and
-S asm output from the compiler confirms that there are few other cases of character loads of constant data, largely because inline character constants such as '@' are loaded into a register as an immediate parameter to a
movi.n instruction. Ditto use of short fields.
Modulus and division operation avoidance
The Lua runtime uses the modulus (
%) and divide (
/) operators in a number of computations. This isn't an issue for most uses where the divisor is an integer power of 2 since the gcc optimiser substitutes a fast machine code equivalent which typically executes 1-4 inline Xtensa instructions (ditto for many constant multiplies). The compiler will also fold any used in constant expressions to avoid runtime evaluation. However the ESP Xtensa CPU doesn't implement modulus and divide operations in hardware, so these generate a call to a subroutine such as
_udivsi3() which typically involves 500 instructions or so to evaluate. A couple of frequent uses have been replaced. (I have ensured that such uses are space delimited, so seaching for " % " will locate these.
grep -P " (%|/) (?!(2|4|8|16))" app/lua53/*.[hc] will list them off.)
Standard Lua 5.3 introduced a string key cache for constant Cstring to TString lookup. In parallel NodeMCU Lua51 also introduced a lookaside cache for ROTable fields access. In practice this provides single probe access for over 99% of key hit accesses to ROTable entries.
In Lua53 these two caching functions (for CString and ROTable key lookup) have been unified into a common Key cache to provide both caching functions with the runtime overhead of a single cache table in RAM. Folding these two lookups into a single Key cache isn't ideal, but given our limited RAM this allows the cache use to be rebalanced at runtime reflection the relative use of CString and ROTable key lookups.
Flash image generation and loading
The current Lua51
app/lua implementation has two variants for dumping and loading Lua bytecode: (1)
lflash.c. In Lua53, these have been unified into a single load / unload mechanism. However, this mechanism must facilitate sequential loading into flash storage, which is is straight forward if with some small changes to the standard internal ordering of the LC file format. The reason for this is that any
Proto can embed other Proto definitions internally, creating a Proto hierarchy. The standard Lua dump algorithm dumps some Proto header components, then recurses into any sub-Protos before completing the wrapping Proto dump. As a result each Proto's resources get interleaved with those of its subordinate Proto hierarchy. This means that resources get to written to RAM non-serially, which is bad news for writing serially to the LFS region.
The NodeMCU Lua53 dump reorders the proto hierarchy tree walk, so that resources of the lowest protos in the hierarchy are loaded first:
dump_proto(p) foreach subp in p dump_proto(subp) dump proto content end
This results in any proto references now being backwards references to protos that are already loaded, and this in turn enables the Proto resources to be allocated as a sequential contiguous allocation units, so the same code can be used for loading LCs into RAM and into LFS.
The standard Lua 5.3 dump format embeds string constants in each proto as a
byte string definition. , NodeMCU needs to separate the collection of strings into an
ROstrt for LFS loading, and this requires an extra processing pass either on dump or load. By doing a preliminary Proto scan to collect tracking the strings used then dumping these as a prologue makes the load process on the ESP a single pass and avoids any need for string resolution tables in the ESP's RAM. The extra memory resources needed for this two-pass dump aren't a material issue in a PC environment.
lundump.c facilitate the addition of LFS mode. Writing to flash uses a record oriented write-once API. Once the flash cache has been flushed when updating the LFS region, this data can be directly accesses using the memory-mapped RO flash window, the resources are written directly to Flash without any allocation in RAM.
lundump.c are compiled into both the ESP firmware and the host-based luac cross compiler. Both the host and ESP targets use the same integer and float formats (e.g. 32 bit, 32-bit IEEE) which simplifies loading and unloading. However one complication is that the host luac.cross application might be compiled on either a 32 or 64 bit environment and must therefore accommodate either 4 or 8 byte address constants. This is not an issue with the compiled Lua format since this uses the grammatical structure of the file format to derive resource relationships, rather than offsets or pointers.
We also have a requirement to generate binary compatible absolute LFS images for linking into firmware builds. The host mode is tweaked to achieve this. In this case the write buffer function returns the correct absolute ESP address which are 32-bit; this doesn't cause any execution issue in luac since these
addressed are never used for access within
luac.cross. On 64-bit execution environments, it also repacks the Proto and other record formats on copy by discarding the top 32-bits of any address reference.
Handling embedded integers in the dump format
A typical dump contains a lot of integer fields, not only for Integer constants, but also for repeat count and lengths. Most of these integers are small, so rather than using a fixed 4-byte field in the file stream all integers are unsigned and represented by a big-endian multi-byte encoding, 7 bits per byte, with the high-bit used as a continuation flag. This means that integers 0..127 encode in 1 byte, 128..32,767 in 2 etc. This mult-ibyte scheme has minimal overhead but reduces the size of typical
.img by 10% with minimal extra processing and less than the cost of reading that extra 10% of bytes from the file system.
A separate dump type is used for negative integer constants where the constant
-x is stored as
-(x+1). Note that endianness isn't an issue since the stream is processed byte-wise, but using big-endian simplifies the load algorithm.
Handling LFS-based strings
The dump function for an individual Proto hierarchy for loading as an
.lc file follows the standard convention of embedding strings inline as a
\<len><\byte sequence>. Any LFS image contains a dump of all of the strings used in the LFS image Protos as an "all-strings" prologue; the Protos are then dumped into the image with string references using an index into the all-strings header. This approach enables a fast one-pass algorithm for loading the LFS image; it is also a compact encoding strategy as string references typically use 1 or 2 byte integer offset in the image file.
One complication here is that in the standard Lua runtime start-up adds a set of special fixed strings to the
strt that are also tagged to prevent GC. This could cause problems with the LFS image if any of these constants is used in the code. To remove this conflict the LFS image loader always automatically includes these fixed strings in the
ROstrt. (This also moves an extra ~2Kb string constants from RAM to Flash as a side-effect.) These fixed strings are omitted from the "all-strings prologue", even though the code itself can still use them. The
ltm.c initialisers loop over internal
char * lists to register these fixed strings. NodeMCU adds a couple of access methods to
ltm.c to enable the dump and load functions to process these lists and resolve strings against them.
Handling LFS top level functions
Lua functions in standard Lua 5.1 are represented by two variant Closure headers (for C and Lua functions). In the case of Lua functions with upvals, the internal Protos can validly be bound to multiple function instances.
eLua and Lua 5.3 introduced the concept of lightweight C functions as a separate function subtype that doesn't require a Closure record. Note that a function variable in a Lua exists as a TValue referencing either the C funtion address or a Closure record; this Closure is not the same as the CallInfo records which are chained to track the current call chain and stack usage.
Whilst lightweight C functions can be declared statically as TValues in ROTables, There isn't a corresponding mechanism for declaring a ROTable containing LFS functions. This is because a Lua function TValue can only be created at runtime by executing a
CLOSURE opcode within the Lua VM. Our Lua51 implementation avoids this issue by generating a top level Lua dispatch function that does the equivalent of emitting
if name == "moduleN" then return moduleN end for each entry, and this takes 4 Lua opcodes per module entry. This lookup has an O(N) cost which becomes non-trivial as N grows large, and so Lua51 has a somewhat arbitrary limit of 50 for the maximum number modules in a LFS image.
In the Lua53 LFS implementation the undump loader appends a
ROTable to the LFS region which contains a set of entries
"module name"= Proto_address. These table values aren't directly accessible via Lua but the NodeMCU C function that does LFS lookup can still retrieve the required Proto address, execute the
CLOSURE and return the corresponding Tvalue. Since this approach uses the standard table access API, which is a lot more efficient that the 4×N opcode
if chain implementation.
Lua51 includes the eLua emergency GC, plus the various EGC tuning parameters that seem to be rarely used. The default setting (which most users use) is
node.egc.ALWAYS which triggers a full GC before every memory allocation so the VM spends maybe 90% of its time doing full GC sweeps.
Standard Lua 5.3 has adopted the eLua EGC but without the EGC tuning parameters. (I have raised a separate GitHub issue to discuss this.) We extend the EGC with the functional equivalent of the
ON_MEM_LIMIT setting with a negative parameter, that is only trigger the EGC with less than a preset free heap left. The runtime spends far less time in the GC and code will run perhaps 5× faster. Since we will only support one ECG mode, we don't need to track this setting in G(L).
Standard Lua includes a throw / catch framework for handling errors. (This has been slightly modified to enable yielding to work across C API calls, but this can be modification can ignored for the discussion of Panic handling.) All calls to Lua execution are handled by
ldo.c through one of two mechanisms:
All protected calls are handled via
luaD_rawrunprotected()which links its C stack frame into the
struct lua_longjmpchain updating the head pointer at
longjmpup to this entry in the C stack, hence as long as there is at least one protected call in the call chain, the C call stack can be properly unrolled to the correct frame.
If no protected calls are on the Lua call stack, then
L->errorJmpwill be null and there is no established C stack level to unroll to. In this case the
luaD_throw()will directly call the
at_panic()handler. Since there is no valid stack frame to unroll to and execution cannot safely continue, so the only safe next step is to abort, which in our case restarts the processor.
Any Lua calls directly initiated through
lua.c interpreter loop or through
luac.cross are protected. However NodeMCU applications can also establish C callbacks which are called directly by the SDK / event dispatcher. The current practice is that these invoke their associated Lua CB using an unprotected call and hence the only safe option is to restart the processor on error. The Lua53 changes have introduced a new
luaL_pcallx() call variant as a NodeMCU extension; This is new call is designed to be used within library CBs that execute Lua CB functions, and is argument compatible with
lua_call(), except that in the case of caught errors it will also return a negative call status. It establishes a default error handler which is invoked at the erroring stack level to provide a stack trace.
This handler posts a task to a panic error handler (with the error string as an upval) before returning control to the invoking routine. If the Lua registry entry
onerror exists and is set to a function, then the handler calls this with the error string as an argument otherwise it calls standard
false in which case the handler exits, otherwise it restarts the processor. The application can use
node.setonerror() to override the default "always restart" action if wanted (for example to write an error to a logfile or to a network
syslog before restarting). Note that
nil and this has the effect of printing the full error traceback before restarting the processor.
Currently all (bar 1) of the cases of such Lua callbacks within the NodeMCU C modules use a simple
lua_call(), with the result that any runtime error executes a panic on error and reboots the processor. By replacing these calls with the
luaL_pcallx(), control is always returned to the C routine, and a later post task can report the error. Note that substituting library uses of
luaL_pcallx() does changes processing paths in the case of thrown errors. If the library CB function immediately returns control to the SDK/event scheduler after the call, then this is the correct behaviour. However, in a few cases, the routine performs post-call clean-up and this adapt the logic depending on the return status.
luac.cross execution environment
As with Lua51, the Lua53 host-build
luac.cross executable will extend the standard functionality by adding support for LFS image compilation and also include a Lua runtime execution environment that can be invoked with the
-e option. This environment was added primarily to facilitate in host testing (albeit with some limitations) of the NodeMCU.
The make target
TEST=1 also adds the Lua Test Suite to the
luac.cross -e execution environment to enable this test support. Due to NodeMCU extensions some changes were required to the Test suite so
app/lua53/host/tests includes the version regressed against our current build. Note that I planning to add some variant capability to the ESP target firmware build in the future.
Enabling the test suite also disables some compiler optimisations and hence increases the size of compiled Lua files, so this test option is not enabled by default in the
The test configuration has some variations from the standard suite:
- NodeMCU lua and luac.cross do not support dynamic loading and the related dynamic loading tests are omitted.
- The tests adopt the Lua compatibility modes implemented in our builds.
- The standard Lua VM supports the initiation of multiple VM environments and this feature is used in some tests. Our firmware supports multiple Lua threads but only one
lua_newstate()instance. So the host
luac.crossmake with the
TEST=1option set also supports multiple VM environments.
This execution environment also emulates LFS loading and execution using the
-F option to load an LFS image before running the
-e script. On POSIX environments this allocates the LFS region using kernel extension, a page-aligned allocator and also uses the kernel API to turning off write access to this region except during the simulated write to flash operations. In this way unintended writes to the LFS region throw a H/W exception in a manner parallel to the ESP environment.
API Compatibility for NodeMCU modules
The Lua public API has largely been preserved across both Lua versions. Having done a difference analysis of the two API and in particular the
lauxlib.h headers which contain the public API as documented in the LRM 5.1 and 5.3, these differences can be grouped into the following categories:
NodeMCU features that we will be adding to Lua53 as part of this migration.
Differences (additions / removals / API changes) that we are not using in our modules and which can therefore be effectively ignored for the purposes of migration.
Source differences which can be encapsulated through common macros will be be removed by updating module code to use this common macro set. In a very small number of cases module functionality will be recoded to employ this common API base.
Both the RM and PiL make quite clear that the public API for C modules is as documented in
lua.h and all its definitions start with
lua_. This API strives for economy and orthogonality. The supplementary functions provided by the auxiliary library (auxlib) access Lua services and functions through the
lua.h interface and without other reference to the internals of Lua; this is exposed through
lauxlib.h and all its definitions start with
There are significant changes to internal APIs as exposed in the other "private" headers within the Lua source directory, and so any code using these APIs may fail to work across the two versions.
One thing that this analysis has underline is that we've been lax about how we allow our modules to be implemented. All existing modules have been modified to use only the public API. If any new or changed module required any of the 'internal' Lua headers to compile, then it is implemented incorrectly.
Lua Language and Libary Compatibility for NodeMCU Lua modules
For the immediate future we will be supporting both builds based on both language variants, so Lua module writers either:
- Avoid using Lua 5.3 language features and implement their module in the common subset (this is currently our preferred approach);
- Or explicitly state any language constraints and include a test for
_VERSION=='Lua.5.3'(or 5.1) in the module startup and explicitly error if incompatible.
Other Implementation Notes
Use of Linker Magic. Lua51 introduced a set of linker-aware macros to allow NodeMCU C library modules to be marshalled by the GNU linker for firmware builds; Lua53 target builds maintain these to ensuring cross-version compatibility. However, the lua53
luac.crossbuild does all library marshalling in
linit.cand this removes the need to try to emulate this strategy on the diverse host toolchains that we support for compiling
Host / ESP interoperability. Our strategy is to build the firmware for the ESP target and
luac.crossin the same make process. This requires the host to use a little endian ANSI floating point host architecture such as x68, AMD64 or ARM, so that the LC binary formats are compatible. This ain't a material constraint in practice.
Emergency GC. The Lua VM takes a more aggressive stance than the standard Lua version on triggering a GC sweep on heap exhaustion. This is because we run in a small RAM size environment. This means that any resource allocation within the Lua API can trigger a GC sweep which can call __GC metamethods which in turn can require to stack to be resized.
LUA_CORE. Some of the Lua header files (e.g.
lauxlib.h) provide a public C API for the Lua runtime. These provide a consistent cross-version API. The remaining headers (e.g.
lstring.h) are intended to be internal to the runtime implementation and these have significant differences between Lua 5.1 and 5.3; these should not be include in C library modules. Both Lua51 and Lua53 have a concept of Lua core files, and these set the
LUA_COREdefine. In order to enforce limited access to the 'private' internal APIs, #ifdef LUA_CORE` guards have been added to all such Lua headers effectively hiding them from application library access.
Detailed changes from standard Lua 5.3 core for NodeMCU Lua53
The Lua type
LUA_TTABLEnow has subtypes
LUA_TTBLROF, with handling of these subtypes following the model adopted for strings being split into short and long subtypes. In general the variant coding for table subtypes is managed as low as possible in the
ltable.croutine. The new
ROTableis a subset of
Tablethat only includes the fields used in ROTables.
The new string cache added in Lua 5.3 is replaced by a unified key cache used for both string and ROTable entry caching. The hash algorithm is now prime multiplier based to allow the use of a modulo of 2^n to avoid the need for an expensive software modulus calculation during cache lookup.
The byte-field access macros
(o)->XXXread accesses for
lu_bytefields in record types that could be in constant (flash-based) memory.
lua.cis a complete reimplementation that more more closely follows the current NodeMCU 5.1
lua.cimplementation. This is as a result of architectural drivers arising from its context and being initiated within the startup sequence of the IoT embedded runtime.
- Processing is based on a single threaded event loop model. The Lua interactive mode processes input lines from a stdin pipe. This must be handled on a line by line basis, and other Lua tasks can interleave any multiline processing, so the standard doREPL approach doesn't work.
- Most OS services and environment processing are supported so much of the standard functionality is irrelevant and is stripped out for simplicity.
stdoutredirection aren't offered as an SDK service, so this is handled in the baselib print function and errors are sent to print. General error reporting on XTENSA builds is not directed to
stderr, but instead the error string is posted as an upval to the C closure error reporter helper. This then calls the Lua error reporter as a separate task.
lauxlib.cimplement the API and interface changes listed below.
ldblib.cnow contains a complete
debugimplementation (Note that
debug.debugfor firmware builds is still TODO as this would require take-over of stdin).
lfunc.cdoes not implement caching of last closure in Proto records
lmath.chas less functions disabled so the
mathlibrary is now pretty complete.
lobject.cincludes a fast implementation of
asm("nsau %0, %1;")instruction which is a lot faster and avoids the need for a byte lookup table.
lstate.cseed algorithm is fixed rather than using randomisation (as this last would break LFS).
ltest.chas its Memcontrol structure extended to include a double linked list. This allows H/W watchpoints to be set on dangling blocks in host gdb to work out what blocks aren't being GCed. Note that the test suite has disabled tests which aren't appropriate (e.g. dynamic loading) and that build will compile in the Lua Test suite if
LUA_USE_HOSTare defined -- that is for test
Locales support has been removed from
luaconf.hhas been reworked to reflect the IoT implementation.
Dynamically loaded libraries aren't supported under the non-OS DSK or RTOS so this functionality has been removed from
LCDstyle compressed line info is implemented as standard through
On XTENSA builds all ROM modules and base functions are in the ROTable
ROM. The metatable of
ROMpoint to this ROTable. On host
ROMcontains only the ROM modules, but its meta
__indexpoints to a separate
baselibROTable. This means that both variants can resolve both the base Lua functions and ROM libraries through the global environment,
_G. However, the
luac.crossbuilds can be linked using standard linker defaults without any GCC, MSVC, etc. botches.
Standard C API and Interface Changes
The API follows the convention of the Lua architecture that subtypes are not in general exposed to the C API, so there for example is no concept of function vs lightweight function when testing a Lua TValue type; these are all of type
"function". Ditto for access to and testing of ROTables; for access these are simply tables that are read-only.
However because ROTables can be declared inline in a Library's C module, there are some extra API calls to bind these static structures at runtime:
void lua_pushrotable(lua_State *L, const ROTable *p)can be used to push a ROtable onto the Lua Stack.
void lua_createrotable(lua_State *L, ROTable *t, const ROTable_entry *e, ROTable *mt)is used the create a ROTable header for the specified
ROTable_entryvector. This is only required for linker-marshalled entry vectors as the ROTable is normally generated by the
int luaL_rometatable(lua_State *L, const char* tname, const ROTable *p)shorthand for
luaL_newmetatable(); used to associate a ROTable metatable with the entry
tnamein the Lua registry.
Other NodeMCU extensions are:
lua_State *lua_getstate(void). Returns the L0 state. Used in C module callbacks to call the Lua VM.
int lua_freeheap(void)returns the amount of free heap.
lua_gc has an extra parameter option
LUA_GCSETMEMLIMITthat is used to set the ECG memory limit.
int lua_getstrings(lua_State *L, int opt)Debug utility to return a table of strings in the specified string table (0 = RAM, 1 = LFS ROM)
int luaL_pcallx(lua_State *L, int narg, int nres). This is designed as a plug-in replacement for
lua_call(L, n, m)used to call Lua CBs in the module event routines. Unlike
lua_call(), this does a protected call and returns a call status. In the case of the called routine throwing an error, instead of the VM panicing and restarting, the error handler collects a full traceback and posts a separate task with this error string as an upval. The error reporter then calls the users reporter function, which can then print or log the error, and continue or restart as required.
int luaL_posttask(lua_State* L, int prio)Post the task popped from the stack at the specified priority.
In order to unify the coding for C Library files for execution in both the Lua51 and Lua53 environments, these changes have also been regressed back into the Lua 5.1 code base.