Updates since last realease:
- Extra details on the multi-host make system
- Extra details on the legacy module support and how this can be used to compile and run modules written for the Lua 5.1 version with no or minimal source changes.
- Slight rework of ROTable implementation to support the same.
- Explanation of the LROR Lua module support and on module rebuilding of Flash based Lua modules.
This port build upon two previous demonstrators:
- a core post of the additional NodeMCU functionality into a host-only demonstrator platform, and
- a back-port of this functionality into the NodeMCU 5.1.4 core to evaluate the issues of integration into the rest of the NodeMCU ecosystem.
Neither of these demonstrators was released for third-party evaluation as they had served their purpose for the author: to enable the development of the final 5.3 code-base. This white paper focuses solely on this final 5.3 port and discards any historic detail that is no longer relevant.
Node MCU Lua is currently based on the eLua fork of Lua 5.1.4. ("NodeMCU 5.1"). This version discards many aspects of the added eLua functionality, and hence these compounded modifications make the Lua core code-base difficult to maintain. However more recent Lua versions now include a back-port of two of the most important eLua features, as well as some very desirable performance feature. We therefore plan to rebaseline NodeMCU on the current stable Lua version (5.3.4). My focus here is to document this Lua 5.3 port ("NodeMCU 5.3") in the form of a white paper. This set of changes to the standard Lua sources is known as LROR (Lua Read-Only Resources) and implements the following objectives:
- It makes the new Lua 5.2 and 5.3 features available to NodeMCU applications;
- It integrates the standard NodeMCU platform support;
- It provides an easy migration path for existing modules to move from the 5.1 to the 5.3 code-base.
- It adds a C API support for a subset of native Lua types enabling a range of constant resources
to be declared and initialised in the
.text
or other read-only segments.
This has LROR approach has three major benefits:
-
It further decreases the RAM usage of the Lua runtime environment and Lua scripts.
-
Since these read-only resources are truly static and available to the executing scripts, this avoids the allocation and garbage collection overheads of creating such resources in RAM and subsequently removing them.
-
The GCC toolchain for the ESP generates byte-aligned string and byte constants, but the ESP's xtensa architecture can only access 32-bit word aligned data resources from the flash-based firmware. NodeMCU uses a software base exception handler to handle and process such unaligned accesses, albeit at a runtime execution hit. Hence accessing C strings in the flash-based
.text
segment invariably creates the per-byte overheads for the software exception handler, and this in turn incurs a material runtime overhead. The new code for LROR allocates strings on word-aligned boundaries; only accesses the string values when necessary and since the lengths are known it usesmemcmp
/memcpy
functions which correctly handle non-aligned comparison.
The current ESP8266 NodeMCU version was heavily modified both by the eLua changes (specifically including Bogdan Marinescu's LTR patch, and the need to support the non-OS SDK, plus further performance optimisation, to the extent that is is becoming very difficult to maintain.
Lua 5.3.4 contains a back-port of two of the key eLua enhancements, lightweight C functions, and the Emergency Garbage Collector (EGC), both of which are required by nodeMCU. It also includes full support for a default 32-bit data type, together with separate integer and floating point data types.
NodeMCU 5.3 also closes the remaining functional gap with Node MCU 5.1 by reimplementing additional
support for ROM-base resources, whilst providing a smooth upgrade path for existing NodeMCU 5.1
modules, and mitigating (effectively removing) the performance impacts of the LTR rotables
implementation and the unaligned string exception handling overhead. Some effort has been made to
achieve these goals with the minimum changes to the core Lua code base.
As NodeMCU 5.3 will effectively unify the previous integer and floating point build variants, separate Integer and Floating point build variant are therefore no longer supported. In the case of the ESP32 which has H/W support for 32-bit floating point, full H/W support for all Lua numeric data types.
_Note that NodeMCU macros require C99 language support, so the LROR patch changes do not attempt to preserve compatibility with C89 or other C standard variants. An example here is that LROR changes sometimes split variable declarations with inserted executable statements to minimise line changes.
Lua 5.3 already introduces the concept of variant data types, and in particular splitting:
- Numbers into separate integer and floating point sub-types.
- Strings into short and long variants, with only short strings being interned.
- Functions into current and lightweight variants.
LROR builds on this variant type functionality by allowing the declaration of read-only (RO) variants of
-
Short (interned) strings. In standard Lua 5.3, interned strings are currently in a RAM hash table, that can be accessed in C using the Lua API. LROR adds a parallel set of RO structures which implement a second string hash table in RO Flash memory. The internal Lua VM lookup code for any interned strings first resolves against the
ROstrt
table before using the RAM-basedstrt
Hence those strings stored in the RO tables do not need to be allocated (or freed) in RAM. This save on both RAM resources and the malloc/free overheads. As interned strings are unique, string comparison can be achieved by address comparison of the interned resource. -
Lua Values and new Key Value type. This implementation follows the eLua pattern, except for extra types. The previous eLua
STRING
type was used to refer to null-terminatedconst char *
byte strings, and is now referred to asCSTRING
, withSTRING
now being used for the native Lua string format. -
New RO and RW subtypes for Tables. The RW subtype is essentially the same implementation as existing Lua tables. The RO subtype is new, and is different to the eLua approach. In standard Lua each table has a table definition structure which then points to a hash table which stores the individual entries. RW tables preserve this approach. The RO table header only contains a subset of the fields relevant to RO tables, and instead of a hash, it points to a vector of key/value pairs.
- Keeping a common table header for both variants means that the RO vs RW table handling is fully encapsulated from the bulk of the table handling code, and is only exposed to the low level access functions.
- Using the vector form for RO table entries is largely the same as for the eLua LTR patch and simplifies inline declaration of RO table resources. However, the encapsulation of RO tables means that a look-aside cache (a feature which has been implemented elsewhere in Lua 5.3) can be used to replace the O(n) entry search by a direct access. On the Lua test environment this achieves over a 95% cache-hit rate, and this effectively eliminates the current runtime overheads of using ROM tables.
-
Ability for store and reload Lua Modules into FLash. A (configurable) fixed flash area, known the the LROR Partition (or LRORP) is set aside for the support of Flash-based Lua modules, RO strings and the RO string table. Whilst the Lua build-process can prepopulate this, an on-module Lua API is also provided to enable Lua developers to rebuld the LRORP on module, that is without needing a firmware build environ ment. If Lua application developers mode the bulk of their Lua application code into LRORP modules, then this effectively more than doubles the ESP RAM available to Lua developers.
-
The Lua C API has also been extended for common API functions which easy the task of writing new NOdeMCU library modules including API functions which would sensibly benefit from (ROM based)
TString *
arguments as an alternative toC string
variants. Note that this is an API extension rather than a substitution, so that existing C extension libraries /modules can still be compiled and work with no (or in a few case minor) changes at a source level.
Whilst standard Lua is designed to compile against a rich set of operating system and applications environments, the previous NodeMCU implementation was only designed to support the ESP8266, though later updates have added beta support for the ESP32. This patch supports the Lua 5.3 environment three target platforms using a common code base:
- The ESP8266 (and derivative ESP8385) architectures using the Espressif non-OS SDK and its GCC xtensa toolshain.
- The ESP32 architecture using the Espressif IDF and its GCC xtensa toolchain.
- Linux host architecture using the standard Linux GCC toolchain. Whilst this will work for many Linux variants, we are specifically testing on our Travis-CI hosting platform.
The first two versions are supported by their respective Espressif / NodeMCU environments. The last
environments is for three specific purposes: (i) supporting Lua core development; (ii) testing
against the standard and NodeMCU-extended Lua test suites; (iii) building a host runnable luac
image which can be used to cross-complie Lua source, and to run limited Lua executables using the
additional -X
execute option. (The LROR preprocessor is bootstrapped and uses this feature so that
NodeMCU build environments don't need to depend on other host Lua installations.) The host Lua
versions are always built as part of an ESP build and hence the luac
executable is always
available on the host for initialising LROR partitions and SPIFFS file systems.
A corollary to this is that the LROR changes only support these platforms. Unlike the Lua and eLua code based which had complex usecases (e.g. to support WinX and OS X, and big and little endian variants), The only conditional LROR code is when it is needed to support these variants. The LROR changes are all or nothing, and are tested as a bundle. There is no out of the box option to cherry-pick bits of LROR functionality and omit others.
We are taking a conscious step away from the rapid development approach adopted in our first NodeMCU implementation, and so our basic guideline is that the source will only be changed when there is a material benefit or unavoidable need for doing so, and either such changes will follow a standard pattern or fulfil a specific functional need. All such changes are document herein. Example patterns include:
-
Replacing
"some_word"
C strings by their equivalentLROR_WORD(some_word)
declarations. Such C strings replacement is optional and is only done where there might be performance and RAM usage benefits for doing: so quoted words are candidates for replacing if:- The word is likely to be used as a constant in a Lua application. Including it in the string
table will avoid creating the equivalent RAM
G(L)->strt
entry. - The word is already used as a TString elsewhere so is already a RO resource.
- It will be referenced repeatedly during execution. Examples include C string constants used in common API calls. C strings used in initialisation routines or error paths aren't usually converted.
- The word is likely to be used as a constant in a Lua application. Including it in the string
table will avoid creating the equivalent RAM
-
A limited number of common multi-word strings are replaced by their equivalent
LROR_STRING(name,"some string")
. -
Changes to
#include
statements in header preambles. In some cases the three variant targets (POSIX, newlib and ESP non-OS SDK) require different header files to compile and build. On the ESP8266 non-OS builds, the Espressif-supplied C header must be used instead because the ESP platform only supports an extremely limited subset of the standard library functions or has a slightly non-stadnard API. and we would prefer to pick up misuse during compile / build. Our general approach is to avoid#ifdef
conditional logic by using a but of header magic so that where such variants are required a simple global substitution of the macroC_HEADER_XXX
for<xxx.h>
instead. -
Use of additional TS variants of Lua API functions. An example here is complementing the standard
lua_pushstring()
API function with an equivalentlua_pushTString()
which takes a TString object as a parameter. Note that these are extensions to the API and not substitutions. This is to ensure that modules using the standard documented Lua API continue to build and work.
Other modules have been added or modified when sensible (for example the additional TS variants of API calls and some internal static routines that have been converted from C string to TString parameters where there is a runtime benefit in doing this.
The other major changes are the implementation of RO Tables and the use NodeMCU 5.1 model of "linker magic" ) to simplify the declaration of ROM global modules, tables and global functions.
Interned strings are maintained in RAM hash table G(L)->strt
. LROR adds a parallel statically
allocated set of RO structures which are accessed through a second G(L)->ROstrt
. The interning
algorthim ensures that such strings are unique and only stored once and so the address of an
interned string is a unique descriptor for the string for the purposes of copying, assignment and
comparison. The lookup code for any new interned strings first resolves against the ROstrt
table
before the RAM-based strt
. Hence those RO strings do not need to be allocated (or freed) in RAM,
saviing on both RAM resources and the malloc/free overheads.
Unfortunately, current C compiler technology does not offer the necessary compile-time functionality
to implement Lua string types fully. A new LROR C source preprocessor has been added into the build
process and this serves a dual purpose:
- The LROR macros are mined and parsed at a source level to generate a C source file which
initialise these data structures.
- The macros also used during source compilation as C preprocessor define macros to declare the
mirror extern
statements that enables all the source to be compiled validly whist resolving these
RO references through the linker step of the build.
So for example the inline usage of the RO string resource for the word on
:
- This is declared by including
LROR_USE_STRING(on)
macro in the source. This generates theextern TString * _LROR_on
declaration.(This macro doesn't support varags, but shorthand versionsLROR_USE_STRINGn(...)
exist forn
= 2 to 5 simplify decaring multiple word strings.) - References to the string are created by including
LROR_WORD(on)
in the source. - As these are standard C preprocessor macros, these are subject to normal conditional compilation
rules. The generated
extern
statements can then subsequently be compiled and linked into the target image. - However, the source is also scanned by the Lua preprocessor and in turn this generates the C file
to declare and initialise the necessary
TString
foron
and any toher usedTString
variable, together with any hash table data structures. Note that this preprocess currently does not use the standard C preprocessor output, so cannot exploit C macros and conditional compile statements. - Any Lua references to the string
on
will also be resolved to the same ROM TString constant, thus avoiding the need to create a new RAM-based TString or its subsequent garbage collection. - Whilst this preprocessor approach isn't ideal, it hasn't proved an issue in practice.
As ROM based constant are truly immutable, some adjustments have been made to exclude these from the scope of the garbage collector, so that it only marks and scans RAM-based resources.
The LROR tables comprise two components:
- a ROtable header record which is a cut-down variant of the current Table structure. The bulk of
the Lua runtime code code-base treats
Table
records as an encapsulated resource (the main exceptions are in the low level handling inltable.c
andlgc.c
). By unifying these RW and RO forms, the RO handling is therefore largely hidden at a code level, minimising the changes needed to support RO tables. - an RO entry (
luaR_entry
) vector, that is backwards compatible with theluaR_entry
encoding used in NodeMCU 5.1, however TString keys and values are now supported as well as the ability to reference secondary tables usingROTable
as well asluaR_entry
referenceds. Using a simple vector format also simplifies the declaration of LROR tables as these can now be simply declared inline using the standard C supporting LROR macros.
Whereas RW tables are hashed with an access time O(1), accessing RO table entry lists are
O(N) and this is doubly bad news with flash access times. I therefore extended the look aside
cache that Lua 5.3 has added (to accelerate C string to interned string conversion) to accelerate RO
table entry accesses. See Lookaside Cache below for details. This means that
repeated RO based table entries such as ROM_G.pairs
are typically accessed in a single ROM probe.
All RO tables must follow the Lua model of requiring a Table header record (and in the case of RO tables the smaller 16 byte variant). However, RTOable headers can either
- be located in RO flash address space and declared with the LROR_TABLE macros, or
- be created at runtime and stored in the Lua registry, together with a referencing
Lvalue
.
In this second case, rather than using the registry reference scheme as described in [PiL 28.3],
we follow the alternative convention used for single instance resources, which is to use the address
of the luaR_entry
vector as the registry key.
We need this approach to enable backwards compatibility for NodeMCU modules which still declare
tables using the deprecated eLua declaration system. (It also enables the declaration of RO table
with Lua updatable metatables.) Details TBC. An extra LValue table attribute used by the
luaH_getshortstr()
and luaH_getint()
access routines to indirect from any LROVAL entries in
luaR_entry
to the corresponding Lvalue in the registry effectively hiding this deprecated use.
This enables the Lua runtime to support unmodified NodeMCU 5.1 modules, albeit with some small RAM
and runtime overhead.
The main functional difference between RO and RW tables is that all write access methods to RO tables will throw a Lua error, as RO tables have to be declared statically at compile time in the source code.
Excepting those with Lua registered headers, ROTables can only have a RO metatable, and attempting
to do a setmetatable()
in this case will also throw a Lua error. Of course RW tables and userdata
can still use an RO metatable, and the eLua-added API call luaL_rometatable()
API is still be
available call so that userdata types can be bound to by a string name.
This is a small change but important enough to detail separately. A variant of the current NodeMCU
5.1 NODEMCU_MODULE()
macro technique is used to allow individual luaR_entry
declarations used
for inline declaration of RO globals.
These are allocated in a dedicated linker section so lbaselib.c:luaopen_base()
allocates a
registry RO header for this global vector and assigns the globalROM_G
and creates a single entry
__index = ROM_G
metatable for _G
. Hence all entries in ROM_G
are resolved as global using
standard Lua inheritance rules.
Note that since the ROM_G
table is visible as a Lua global and is enumerable by the pairs
function: for k,v in pairs(ROM_G) do --...
.
The entries for core functions such as print
and pairs
are created by section and linker magic
in this table, as are any global table definitions such as string
and any user modules. And since
ROM_G
is just a standard ROTable, lookaside caching also works for these entries.
User modules can also statically declare global functions and values in the same way, so for
example ltablib.c
contains the following conditional static declaration to add the table unpack
function as the global unpack
(NLF is an acronym, Named Lightweight Function):
#if defined(LUA_COMPAT_UNPACK)
LROR_GLOBAL_NLF(unpack, unpack)
#endif
The recommneded method of module initialisation is by static declaration. During startup,
luaL_openlibs()
scans ROM_G
and each table in it (including bound metatables) for entries with
the name of the format tablename__init
which points to a function. The table name must be a valid
table name and in the case of table and metatable entries must refer to the corresponding container
table. If such an entry exists then the startup initialisation code will call the function to
perform any module initialisation needed. Using ROM_G
itself is deprecated for performance / bloat
reasons, but it is currently used by the new version of the NODEMCU_MODULE()
macro to allow
existing modules to compile and run without modification.
Note that _G
isn't scanned as this is build dynamically at runtime and any dynamic initialisation
can be done programmatically.
Needs updating to include ESP32 IDF builds.
The NodeMCU build system uses "recursion magic", that is you do a top level make which has
- a set of call-back variable assigns
- a list of dependant submodules
- and some local module action rules, which typically execute each subordinate make
- and each subordinate make calls back its parent for its context.
So in the case of the lua core: the top level nodemcu-firmware make invokes the app make which invokes the lua make which references back the app make which which references back the nodemcu-firmware make. Hence all of the rules for $(GCC) etc. are defined only in the top level NodeMCU make.
This system has been slightly modified with NodeMCU 5.3 in that the app make calls the lua make
with a host
target and this target runs a more conventional makefile to build the host variants of
lua
and luac
. This also uses the subdirectory app/lua/host
for host-only modules and resources
such as lrostring.c
, loslib.c
, generate_LROR.lua
, etc. These host versions do not include the
rest of the NodeMCU ecosystem.
Hence an host-executable version luac
with the extra -X
option is available for later scripts
either to execute Lua unitilies as part of the build process or to use luac
to convert Lua
sources to target compatable lc
files.
The host/lrostring.c
is both generated by generate_LROR.lua
and maintained it git, which might
be seen as a catch-22, but in fact this version contains the minimal subset of RO string resources
need to compile and build these host executables, and therefore is only rarely updated to reflect
changes to the Lua core.
The NODEMCU macros also use predefined deines to generate flag variables in objects in the modules
directory of the form XXXX_module_selected
and only links those modules containing an externally
linkable symbol named with this pattern.
The target RO string resources systen is piggy-backed onto this approach so a build-time scan of
this list of modules and the lua core is used create and compile a build-specific rostring.o
, so
- this version is generated during the build and not stored in git.
- it only includes LROR resources in the lua core and selected modules, so LROR resources in other (non-selected) modules are not included.
- (under review) this ualso means that only the main
<module>.c
for selected modules is scanned. Where modules pull in other files in the modules folder, these won't be scanned and should not contain LROR resources unless the<module>.c
the correspondingLROR_USE_STRING(name)
declarations to create the resources.
Details still being finalised.
In taget builds, the rostring.c
maps into two linker sections. The first is contains the RO
TString
resource definitions, and the second is fixed section which is at the end of the flash
image. This fixed section is size both flash sector-aligned and of a configurable size through the
ld definitions, which we call the LROR Partition (LRORP).
- This segment is directly addressable in the ESP ICODE address space, enabling these resourses to be accessed directly by modules and the Lua runtime system (RTS).
- However, because it is a fixed flash-sector-aligned area, it can also be rewritten on an occasional basis to update its contents, either by using the esptool.py though the UART and ROM-based firmware loader or under program control as part of a rebuild.
The LRORP contains:
- A fixed header to reference contained resources
- Optional additional RO Tstring declarations
- Other constants such as integer and float LValues.
- The current
ROstst
hash table - 0 or more Lua module hierarchies in loaded format.
This last item requires a further explanation. In Lua, a compiled module is loaded into a hierachy
of resources in structures such as Proto
definitions which are largely readonly, but
collectable resources, that is can be scanned, marked, collected and discarded by the GC. However,
LROR has already modified the GC to bypass RO resources.
So there is nothing in principle to prevent us loading such resources into a readonly partition such as the LRORP. However, update is complicated by coherency and integrity issues both at a hardware level (the ESPs ICODE hardware cache), and in terms of any overwrite of referenced resources corrupting the GC referencing system (overwriting a module that is currently referenced within the application could crrupt and crash the RTS in an indeterminate manner.
We therefore support a simple update model for the LRORP: a small NodeMCU-specific API is provides the ability to use a RAM-based Lua script to rebuild and reload a new LRORP. The LRORP reload is effectively an atomic operation which restarts the processor on completion. Nonetheless, this feature enable Lua application programmers to store their own or standard NodeMCU Lua modules in the LRORP with a simple loader script.
Once loaded, such modules can be referenced by using the standard require "module"
syntax. This
has a minimal and almost zero load overhead compared to loading from SPIFFS, and only the RW
resources such as any globals and locals created by the executing modules take up RAM. The code
and constants are exectued direclty from flash address space.
These feature requires the ROstrt
to be moved to the LRORP, since loaded modules can add their own
strings which much be referenced through the ROstrt
.
The LROR functionality is first tested on a debug build using the host lua
executable with an
extended version of the Lua 5.3.4 test suite to hammer out most
errors within a benign development environment. We are also investigating using large subset of
this suite on a non-OS SDK environment with a minimal module set for testing on the ESP8266
architecture.
TBC, but our objective is to introduce the Lua 5.3 functionality in a reduced RAM footprint and increased runtime performance.
Our objective is that existing module will work with minimal changes. (An example of such a minimal
change would be the addition of an additional #include
statement, and equivalent changes which
could be done "en-mass" without an intimate knowledge of the module.
Nonetheless, the legaty eLua rotables
method of declaration is deprecated and using this comes
with some small RAM and performance overheads, so we would encourage all module maintainers to
migrate to the new NodeMCU 5.3 module interface as soon as practical.
Existing modules should only require API changes in exceptional circumstances.
The aspect that does need further thought here in the impact of the split number types (integer and float), which is really a pure Lua 5.1 -> 5.3 migration issue. More research is needed.
In essence, high use string literals can be replaced by the corresponding LROR_WORD()
or
LROR_STRING()
declaration. This replaces a compiled cost char *
reference by the corresponding
const TString *
one. The second LROR_STRING()
macro requires you to provide a symbolic name for
the string as these ultimately generate external references that are resolved during link. The first
is simply a syntactic sugar which uses the automatically generated name _TS_<word>
. The common
Lua API functions which accept a string argument now have TString
equivalents, so for example
lua_pushstring(L, "normal");
becomes
lua_pushTString(L, LROR_WORD(normal));
Note that the Cstring version is still supported and often used (for example, all error path strings
constants have been left in their Cstring form). However, the second form is usually adopted for
main path code, and this pushes a (TString *)
address directly onto the stack, (whereas the first
calls a new string API call, which even in the case of an existing string involves recomputing the
hash of the string and doing a strcmp()
).
The RO string table is external to modules so any LROR word string reference will need a
corresponding extern
declaration to compile. The LROR_USE_STRING(name)
and
LROR_USE_STRINGn(name, ...)
(where n=2..5) can be added to the source code to wrap these.
Internally as well as the global (RAM) string table, a second (RO) string table generated during the
build process and stored in addressable flash. This RO table is used as a second level for
resolution when any new string is resolved in the case of a miss against the RAM string table. In
the case of a hit against the RO table, the address of the RO TString is returned, and therefore
new entries are only created in the G(L)->strt
for strings which aren't already in ROM.
The LROR patch introduces a specific method for writing modules in such a way that they fully utilise read-only resources. Note that this does not preclude the use of the standard Lua API to declare modules and to expose them at runtime; however, modules using the standard API will use RAM for all module resources.
Note: The LROR declaratives are different from the LTR API and therefore C modules written for LTR will require modification to use LROR
Anther limitation is that clearly RO tables can only refer to RO resources.
Consider a simple example where you want to register a simple module called "mod" that has a single function named "f". For standard Lua, you would code this as follows:
static const luaL_reg mod_map[] =
{
{ "f", f_implementation },
{ NULL, NULL }
};
LUALIB_API int luaopen_mod( lua_State *L )
{
luaL_register( L, "mod", mod_map );
other_initialisation();
return 1;
}
For the LROR implementation, as well however, you'd need to define the same thing like this:
LROR_ENTRIES(mod) = {
LROR_TABLE_ENTRY_NAME_LIGHTFUNC(f, f_implementation),
LROR_TABLE_ENTRY_NAME_LIGHTFUNC(mod__init, other_initialisation)
};
LROR_GLOBAL_TABLE(mod, NULL)
A few points about the RO tables above:
- The RO table entries are declared by a
LROR_ENTRIES(mod)
initialiser which includes a number ofLROR_TABLE_ENTRY
macros to initialise the entries. - The table entry macros have short and long form, so
LROR_TABLE_ENTRY_NAME_LIGHTFUNC(name, func)
declares a lightweight C function for the given name. The macros also have a short acronym form, in this exampleLROR_TENLF(name,func)
. - The
Table
structure itself is then declare using aLROR_TABLE(mod, metatable)
macro. Note that this must follow theLROR_ENTRIES
macro since it uses asizeof()
computation to calculate the number of table entries (which must be less than 255). Note that themetatable
is set toNULL
if the table does not have a metatable. LROR_GLOBAL_TABLE(mod, metatable)
is a variant ofLROR_DEFINE_TABLE(mod, metatable)
that also makes the table accessible from the global tableROM_G
within Lua as discussed above.- The tables are by named with the
static const
attribute so can be referenced within the C module by their name using normal C scoping rules. This avoids the risk of name clash. - If you need to export the name to other C files (e.g. for a
lua_pushrotable()
call then you will need to export a get wrapper function. - In general the C API for table read access works as normal on RO tables.
- Any of the C API calls for table access which attempts to update the table will result in an error being thrown.
- At a Lua API level similar restrictions apply: the read only functions work as expected, but any
attempt to write to a table will throw an error, including use of the functions
insert
,remove
andsort
.
Like any other table, RO tables can an associated metatable and metatables can be RO tables, but
unlike the LTR patch, the metamethod __metatable
is not overloaded with a different semantic.
Lua 5.3 adds a lookaside cache to avoid interning of repeated string requests. It is a simple 53 × 2 slot cache. (These dimensions are configurable defined constants). Each new string request is hashed the two referenced TString entries are matched against the putative Cstring value. A hit short-circuits the relatively expensive hashing and string lookup process.
Accessing LROR string resources is by direct use of their TString references, and therefore doesn't use this string request path. On the other hand the O(n) ROTable access is now an issue, so a similar approach is used for ROTables, using this same cache table. This denormalisation is slightly a hack, but this is considered acceptable because of the need to avoid any additional RAM overheads.
The denormalisation exploits the fact that all TString and luaR_entry resources are word aligned.
So the bottom bit of the address is overloaded: 0 = TString
entry; 1 = luaR_entry
. In this
second case the 8 MSB are luaR_entry index and the remaining LSbits (less bit 0) are matched
against the Table pointer reference, allowing the index into the luaR_entry vector to be recovered
in the case of a cache hit. A bit nasty perhaps, but a material performance boost.
Lua 5.3 now subdivides strings into short and long variants, depending on a configurable threshold
(currently 40 characters). Short strings are interned as with previous Lua versions, and are
therefore unique, so two short TString *
pointers refer to the same string if and only if the
pointers refer to the same location. Hence no strcmp()
comparisons are required for equality
comparison.
Lua 5.3 also introduces a new two-way look-aside cache based on the address of the Cstring parameter
in the function luaS_new()
. This associates a TString *
with the address of the C string
variable used to create it. In the case of a hit the existing TString string and the new string are
compared and if a match this is used to short-circuit hash calculation. In the case of a miss, the
result of the resolved TString *
is used to bump the cache entry. (In LROR, this cache is also
used for ROM table entries.)
The new string functions check for an existing copy in the case of a short string and this match is on (i) the hash, (ii) the length and (iii) a direct comparison. This means that long strings are no longer automatically interned on the assumption that they are unlikely to be replicated. This avoids the size dependent hashing cost for longer strings, but in certain usecases it can result in a major memory increase, though in practice for IoT use the upside is a good performance boost and the risks of RAM growth minimal.
Also note that the new luaS_newliteral()
is designed for C string constants and this calls
luaS_newlstr()
based on the sizeof
the string; using this also bypasses the cache lookup.
Any strings declared with the LROR initialisers will be generated at build time and bypass all this complexity. It is unclear how this will effect the efficacy of the string cache, so I have added some internal instrumentation diagnostics to examine this.
Strings and tables are collectable objects. In standard Lua, all collectable objects are linked into
one of three lists with heads in G(L)
(fields fixedgc
. finobj
, allgc
) using the common
next
field to link them. The GC ignores the first category normal collection, and only uses it
during shutdown. RO objects can neither be GCed or collected at shutdown, so we've added an extra
G(L) category: ROobj and all RO objects are linked into this (to facilitate diagnostic inspection of
RO resources).
Standard RW tables have a complex structure that has unnecessary storage overhead for small keyed tables. I considered the pros and cons of also using this structure for ROM tables and decided that my criteria should be to adopt a variant implementation:
- if the overall savings in flash data space exceeded the extra code overhead of the variant, and
- there were performance benefits in doing the ROM variant.
On this basis, this initial implementation includes a variant for handling simplified RO tables. However, some efforts have been made to minimise the code variation. Since much of the table handling code is to do with write access, storage allocation, resizing and garbage collection, none of which apply to RO tables, the amount of new code needed to support RO tables is quite modest.
The updated table structure retains a set of common fields:
GCObject *next;
lu_byte tt;
lu_byte marked;
lu_byte flags;
lu_byte lsizenode;
struct Table *metatable;
and maintain two separate variants of the access field for the RW and RO variants. The RW variant retains the existing fields in an anonymous union to minimise code changes:
GCObject *gclist;
unsigned int sizearray;
TValue *array;
Node *node;
Node *lastfree;
The RO variant replaces these fields with
luaR_entry *array;
It also overloads the lsizenode
field with a sizeentries
and uses a sizeof()
calculation to
generate this field, so scanning the entries is base on this size field rather than adding a
dummy {LROR_NILKEY, LROR_NILKEY}
stop entry.
The gclist
field is only used in a variant aware part of lgc.c
which is not called for ROM
objects and the tt
decodes which variant is used. I've hoisted the metatable reference into the
common part because this metatable field is referenced a lot and keeping in the common area removes
a class of variant coding.
Hence all allocated (that is RW) Table
records are the size of the larger (RW) variant, but
this isn't an issue since the RO forms are only created through the precompile and build process.
Note whilst ROtable entries can take any RO'able value, only short Cstring and short TString keys are currently supported. (Need to add more details on implementation.)
If we do later decide to allow settable metatables, then a reserved value for metatable
for
example (void *)-1
which would denote that the metatable is stored in the registry. In this case,
the metatable association would be the one data element of a ROtable that is writeable. In order to
achieve this the ROtables maintain this field within the Lua registry, and the metatable API would
need to contain variant code to access and update this, but from a C metatable API viewpoint there
is no functional difference between a RO and a RW table. However there would be a performance hit
as all RO table accesses would need to query the registry.
The linker magic used in our target builds depends on replacing the default linker script with a
NodeMCU-customised one. At the moment the host implementation is a little bit of a botch, since
I don't want to support the subtle variants of host linker scripts out there. This works on Linux /
gcc build environment by exploiting a reserved section .rodata1
(which isn't otherwise used in
Lua builds) and a known link order.
This works fine for development and testing, but it still a bit tacky. Need to think of a robust method of implementing this.
The standard package
table contains some standard subtables, and LROR move some of these into ROM
Tables to minimize the RAM footprint:
package.preloaded
. This lists package that are preloaded but not initialised and therefore must still be imported into a Lua application by arequire
. This is an empty ROM table in LROR.package.loaded
This is used by the standardrequire
function to avoid duplicating reloads of dynamic modules. It must correctly resolve references such aspackage.loaded.string
to avoid errors on valid Lua statements such asstr=require "string"
. This is initialised byloadlib.c
to and empty RAM table, but with an empty meta__index
pointing to a search function which scansROM_G
for the corresponding table entry. Note that this means that indexing this table in Lua usingpairs()
will not enumerate the loaded modules. You have to scanROM_G
to do this.package.searchers
(previouslyloaders
in Lua 5.1). This is a ROM table with a single entry for the standard Lua searcher. LROR drops the other three searcher types for preload, C and C root. However, sincepackage
is itself a RAM table there is nothing to stop Lua application programmers adding their own searchers, for example I use the following in myinit.lua
to support autoloading of modules over wifi:
package.searchers = {load("net_autoloader.lua"), package.searchers[1]}
*Caveat: This implementation is an interim (actually the third version) to demonstrate the feasibility of the LROR concept. Once we have a working framework to evaluate, it is anticipated that the language interface might be reworked.
Traditional Lua binds Lua resources at runtime through the Lua API which accepts standard C string, integer and other type arguments. This patch enables read-only variants of such resources to be declared inline in the source code.
- In the case of string declarations, these C macros are preprocessed in the source as normal to
generate the compiled inline code to refer to the external resources, so
LROR_WORD(on)
becomes&_TS_on
where_TS_on
the the externalTString
for the literal "on". There is also a Lua preprocessor which can scan the source base to regeneraterostring.c
, the C module which statically declares all ROM-based TString tables. - Table resources map directly onto standard C macros, so are just compiled normally inline, and no additional Lua post-processing is needed. Whilst this approach might seem a kludge, something very similar was used for used by Database products such as Oracle to generate and embed SQL statement in C and other code.
A table definition first declares the luaR_entry
vector using a LROR_ENTRIES()
initialiser which
includes a number of LROR_TABLE_ENTRY
macros to initialise the entries.
LROR_ENTRIES(table)
is a macro which generates thestatic const luaR_entry table[]
statement which much be followed by a static initialiser which includes one or more table entriesLROR_TABLE_ENTRY(key, value)
is a macro to initialise each key, value pair. There are a set of wrapper macros which hide the hassle of theluaR_key
andValue
declarations in both long and acronym form, for exampleLROR_TENLF(name,func)
is the equivalent ofname = func
in a standard Lua initialiser.
The Table
is then itself declared one of the three LROR_TABLE()
variants:
LROR_TABLE(table_name, metatable_name)
. Defines a LROR table. Note that the table names are symbolic references used internally with the C build and not exposed by default with the Lua application name space. The table name is unquoted and must conform to the normal C name syntax. The second argument isNULL
for tables without a metatable.LROR_GLOBAL_TABLE(table_name, metatable_name)
. Variant of above which includes an entry for this table in theROM_G
table as discussed above, and hence the table name is exposed to the Lua application in the global name space.LROR_METATABLE(table_name,flags)
. Defines a LROR metatable. This is different from the normal tables in that Lua 5.3 maintains a methods event flags bitmask in the header which enables the optimisation of__index
,__newindex
,__gc
,__mode
,__len
and__eq
meta methods. So if your metatable contains the index and newindex entries then set the flags field to1u<<TM_INDEX | 1u<<TM_NEWINDEX
.
Normal C scoping and declaration rules apply, so if a table has a meta table, it is easier to declare the metatable first.
-
Macro variants of
tt
and_tt
reference that generate L32 load instructions on the xtensa builds -
Review loadlib model
-
Design walk-through of LROR resources and LGC to make sure that they are managed correctly.
-
String processor implementation doesn't support preprocessor macro expansion in LROR strings
-
Port Lua compact debug patch
-
Optional cut down version of math. Also check code for float constants.
-
Optional cut down version of debug
-
Add performance stats for string / table cache.
-
Consider how to implement LROR closures.