Skip to content

Instantly share code, notes, and snippets.

@TerryE
Last active July 1, 2017 11:20
Show Gist options
  • Save TerryE/8afa5022042291b8add1ff3886f6c014 to your computer and use it in GitHub Desktop.
Save TerryE/8afa5022042291b8add1ff3886f6c014 to your computer and use it in GitHub Desktop.
LROR Paper.md

LROR (Lua Read-Only Resources) in NodeMCU Lua 5.3

Drafting Caveat

Updates since last realease:

  • Extra details on the multi-host make system
  • Extra details on the legacy module support and how this can be used to compile and run modules written for the Lua 5.1 version with no or minimal source changes.
  • Slight rework of ROTable implementation to support the same.
  • Explanation of the LROR Lua module support and on module rebuilding of Flash based Lua modules.

This port build upon two previous demonstrators:

  • a core post of the additional NodeMCU functionality into a host-only demonstrator platform, and
  • a back-port of this functionality into the NodeMCU 5.1.4 core to evaluate the issues of integration into the rest of the NodeMCU ecosystem.

Neither of these demonstrators was released for third-party evaluation as they had served their purpose for the author: to enable the development of the final 5.3 code-base. This white paper focuses solely on this final 5.3 port and discards any historic detail that is no longer relevant.

Background

Node MCU Lua is currently based on the eLua fork of Lua 5.1.4. ("NodeMCU 5.1"). This version discards many aspects of the added eLua functionality, and hence these compounded modifications make the Lua core code-base difficult to maintain. However more recent Lua versions now include a back-port of two of the most important eLua features, as well as some very desirable performance feature. We therefore plan to rebaseline NodeMCU on the current stable Lua version (5.3.4). My focus here is to document this Lua 5.3 port ("NodeMCU 5.3") in the form of a white paper. This set of changes to the standard Lua sources is known as LROR (Lua Read-Only Resources) and implements the following objectives:

  • It makes the new Lua 5.2 and 5.3 features available to NodeMCU applications;
  • It integrates the standard NodeMCU platform support;
  • It provides an easy migration path for existing modules to move from the 5.1 to the 5.3 code-base.
  • It adds a C API support for a subset of native Lua types enabling a range of constant resources to be declared and initialised in the .text or other read-only segments.

This has LROR approach has three major benefits:

  • It further decreases the RAM usage of the Lua runtime environment and Lua scripts.

  • Since these read-only resources are truly static and available to the executing scripts, this avoids the allocation and garbage collection overheads of creating such resources in RAM and subsequently removing them.

  • The GCC toolchain for the ESP generates byte-aligned string and byte constants, but the ESP's xtensa architecture can only access 32-bit word aligned data resources from the flash-based firmware. NodeMCU uses a software base exception handler to handle and process such unaligned accesses, albeit at a runtime execution hit. Hence accessing C strings in the flash-based .text segment invariably creates the per-byte overheads for the software exception handler, and this in turn incurs a material runtime overhead. The new code for LROR allocates strings on word-aligned boundaries; only accesses the string values when necessary and since the lengths are known it uses memcmp / memcpy functions which correctly handle non-aligned comparison.

The current ESP8266 NodeMCU version was heavily modified both by the eLua changes (specifically including Bogdan Marinescu's LTR patch, and the need to support the non-OS SDK, plus further performance optimisation, to the extent that is is becoming very difficult to maintain.

Lua 5.3.4 contains a back-port of two of the key eLua enhancements, lightweight C functions, and the Emergency Garbage Collector (EGC), both of which are required by nodeMCU. It also includes full support for a default 32-bit data type, together with separate integer and floating point data types.

NodeMCU 5.3 also closes the remaining functional gap with Node MCU 5.1 by reimplementing additional support for ROM-base resources, whilst providing a smooth upgrade path for existing NodeMCU 5.1 modules, and mitigating (effectively removing) the performance impacts of the LTR rotables implementation and the unaligned string exception handling overhead. Some effort has been made to achieve these goals with the minimum changes to the core Lua code base.

As NodeMCU 5.3 will effectively unify the previous integer and floating point build variants, separate Integer and Floating point build variant are therefore no longer supported. In the case of the ESP32 which has H/W support for 32-bit floating point, full H/W support for all Lua numeric data types.

Main changes implemented by the LROR patch

_Note that NodeMCU macros require C99 language support, so the LROR patch changes do not attempt to preserve compatibility with C89 or other C standard variants. An example here is that LROR changes sometimes split variable declarations with inserted executable statements to minimise line changes.

Lua 5.3 already introduces the concept of variant data types, and in particular splitting:

  • Numbers into separate integer and floating point sub-types.
  • Strings into short and long variants, with only short strings being interned.
  • Functions into current and lightweight variants.

LROR builds on this variant type functionality by allowing the declaration of read-only (RO) variants of

  • Short (interned) strings. In standard Lua 5.3, interned strings are currently in a RAM hash table, that can be accessed in C using the Lua API. LROR adds a parallel set of RO structures which implement a second string hash table in RO Flash memory. The internal Lua VM lookup code for any interned strings first resolves against the ROstrt table before using the RAM-based strt Hence those strings stored in the RO tables do not need to be allocated (or freed) in RAM. This save on both RAM resources and the malloc/free overheads. As interned strings are unique, string comparison can be achieved by address comparison of the interned resource.

  • Lua Values and new Key Value type. This implementation follows the eLua pattern, except for extra types. The previous eLua STRING type was used to refer to null-terminated const char * byte strings, and is now referred to as CSTRING, with STRING now being used for the native Lua string format.

  • New RO and RW subtypes for Tables. The RW subtype is essentially the same implementation as existing Lua tables. The RO subtype is new, and is different to the eLua approach. In standard Lua each table has a table definition structure which then points to a hash table which stores the individual entries. RW tables preserve this approach. The RO table header only contains a subset of the fields relevant to RO tables, and instead of a hash, it points to a vector of key/value pairs.

    • Keeping a common table header for both variants means that the RO vs RW table handling is fully encapsulated from the bulk of the table handling code, and is only exposed to the low level access functions.
    • Using the vector form for RO table entries is largely the same as for the eLua LTR patch and simplifies inline declaration of RO table resources. However, the encapsulation of RO tables means that a look-aside cache (a feature which has been implemented elsewhere in Lua 5.3) can be used to replace the O(n) entry search by a direct access. On the Lua test environment this achieves over a 95% cache-hit rate, and this effectively eliminates the current runtime overheads of using ROM tables.
  • Ability for store and reload Lua Modules into FLash. A (configurable) fixed flash area, known the the LROR Partition (or LRORP) is set aside for the support of Flash-based Lua modules, RO strings and the RO string table. Whilst the Lua build-process can prepopulate this, an on-module Lua API is also provided to enable Lua developers to rebuld the LRORP on module, that is without needing a firmware build environ ment. If Lua application developers mode the bulk of their Lua application code into LRORP modules, then this effectively more than doubles the ESP RAM available to Lua developers.

  • The Lua C API has also been extended for common API functions which easy the task of writing new NOdeMCU library modules including API functions which would sensibly benefit from (ROM based) TString * arguments as an alternative to C string variants. Note that this is an API extension rather than a substitution, so that existing C extension libraries /modules can still be compiled and work with no (or in a few case minor) changes at a source level.

Supplementary design goals and details

Multiple platform support

Whilst standard Lua is designed to compile against a rich set of operating system and applications environments, the previous NodeMCU implementation was only designed to support the ESP8266, though later updates have added beta support for the ESP32. This patch supports the Lua 5.3 environment three target platforms using a common code base:

  • The ESP8266 (and derivative ESP8385) architectures using the Espressif non-OS SDK and its GCC xtensa toolshain.
  • The ESP32 architecture using the Espressif IDF and its GCC xtensa toolchain.
  • Linux host architecture using the standard Linux GCC toolchain. Whilst this will work for many Linux variants, we are specifically testing on our Travis-CI hosting platform.

The first two versions are supported by their respective Espressif / NodeMCU environments. The last environments is for three specific purposes: (i) supporting Lua core development; (ii) testing against the standard and NodeMCU-extended Lua test suites; (iii) building a host runnable luac image which can be used to cross-complie Lua source, and to run limited Lua executables using the additional -X execute option. (The LROR preprocessor is bootstrapped and uses this feature so that NodeMCU build environments don't need to depend on other host Lua installations.) The host Lua versions are always built as part of an ESP build and hence the luac executable is always available on the host for initialising LROR partitions and SPIFFS file systems.

A corollary to this is that the LROR changes only support these platforms. Unlike the Lua and eLua code based which had complex usecases (e.g. to support WinX and OS X, and big and little endian variants), The only conditional LROR code is when it is needed to support these variants. The LROR changes are all or nothing, and are tested as a bundle. There is no out of the box option to cherry-pick bits of LROR functionality and omit others.

Minimise changes to the core Lua source code

We are taking a conscious step away from the rapid development approach adopted in our first NodeMCU implementation, and so our basic guideline is that the source will only be changed when there is a material benefit or unavoidable need for doing so, and either such changes will follow a standard pattern or fulfil a specific functional need. All such changes are document herein. Example patterns include:

  • Replacing "some_word" C strings by their equivalent LROR_WORD(some_word) declarations. Such C strings replacement is optional and is only done where there might be performance and RAM usage benefits for doing: so quoted words are candidates for replacing if:

    • The word is likely to be used as a constant in a Lua application. Including it in the string table will avoid creating the equivalent RAM G(L)->strt entry.
    • The word is already used as a TString elsewhere so is already a RO resource.
    • It will be referenced repeatedly during execution. Examples include C string constants used in common API calls. C strings used in initialisation routines or error paths aren't usually converted.
  • A limited number of common multi-word strings are replaced by their equivalent LROR_STRING(name,"some string").

  • Changes to #include statements in header preambles. In some cases the three variant targets (POSIX, newlib and ESP non-OS SDK) require different header files to compile and build. On the ESP8266 non-OS builds, the Espressif-supplied C header must be used instead because the ESP platform only supports an extremely limited subset of the standard library functions or has a slightly non-stadnard API. and we would prefer to pick up misuse during compile / build. Our general approach is to avoid #ifdef conditional logic by using a but of header magic so that where such variants are required a simple global substitution of the macro C_HEADER_XXX for <xxx.h> instead.

  • Use of additional TS variants of Lua API functions. An example here is complementing the standard lua_pushstring() API function with an equivalent lua_pushTString() which takes a TString object as a parameter. Note that these are extensions to the API and not substitutions. This is to ensure that modules using the standard documented Lua API continue to build and work.

Other modules have been added or modified when sensible (for example the additional TS variants of API calls and some internal static routines that have been converted from C string to TString parameters where there is a runtime benefit in doing this.

The other major changes are the implementation of RO Tables and the use NodeMCU 5.1 model of "linker magic" ) to simplify the declaration of ROM global modules, tables and global functions.

RO Strings

Interned strings are maintained in RAM hash table G(L)->strt. LROR adds a parallel statically allocated set of RO structures which are accessed through a second G(L)->ROstrt. The interning algorthim ensures that such strings are unique and only stored once and so the address of an interned string is a unique descriptor for the string for the purposes of copying, assignment and comparison. The lookup code for any new interned strings first resolves against the ROstrt table before the RAM-based strt. Hence those RO strings do not need to be allocated (or freed) in RAM, saviing on both RAM resources and the malloc/free overheads.

Unfortunately, current C compiler technology does not offer the necessary compile-time functionality to implement Lua string types fully. A new LROR C source preprocessor has been added into the build process and this serves a dual purpose: - The LROR macros are mined and parsed at a source level to generate a C source file which initialise these data structures. - The macros also used during source compilation as C preprocessor define macros to declare the mirror extern statements that enables all the source to be compiled validly whist resolving these RO references through the linker step of the build.

So for example the inline usage of the RO string resource for the word on:

  • This is declared by including LROR_USE_STRING(on) macro in the source. This generates the extern TString * _LROR_on declaration.(This macro doesn't support varags, but shorthand versions LROR_USE_STRINGn(...) exist for n = 2 to 5 simplify decaring multiple word strings.)
  • References to the string are created by including LROR_WORD(on) in the source.
  • As these are standard C preprocessor macros, these are subject to normal conditional compilation rules. The generated extern statements can then subsequently be compiled and linked into the target image.
  • However, the source is also scanned by the Lua preprocessor and in turn this generates the C file to declare and initialise the necessary TString for on and any toher used TString variable, together with any hash table data structures. Note that this preprocess currently does not use the standard C preprocessor output, so cannot exploit C macros and conditional compile statements.
  • Any Lua references to the string on will also be resolved to the same ROM TString constant, thus avoiding the need to create a new RAM-based TString or its subsequent garbage collection.
  • Whilst this preprocessor approach isn't ideal, it hasn't proved an issue in practice.

As ROM based constant are truly immutable, some adjustments have been made to exclude these from the scope of the garbage collector, so that it only marks and scans RAM-based resources.

RO Tables

The LROR tables comprise two components:

  • a ROtable header record which is a cut-down variant of the current Table structure. The bulk of the Lua runtime code code-base treats Table records as an encapsulated resource (the main exceptions are in the low level handling in ltable.c and lgc.c). By unifying these RW and RO forms, the RO handling is therefore largely hidden at a code level, minimising the changes needed to support RO tables.
  • an RO entry (luaR_entry) vector, that is backwards compatible with the luaR_entry encoding used in NodeMCU 5.1, however TString keys and values are now supported as well as the ability to reference secondary tables using ROTable as well as luaR_entry referenceds. Using a simple vector format also simplifies the declaration of LROR tables as these can now be simply declared inline using the standard C supporting LROR macros.

Whereas RW tables are hashed with an access time O(1), accessing RO table entry lists are O(N) and this is doubly bad news with flash access times. I therefore extended the look aside cache that Lua 5.3 has added (to accelerate C string to interned string conversion) to accelerate RO table entry accesses. See Lookaside Cache below for details. This means that repeated RO based table entries such as ROM_G.pairs are typically accessed in a single ROM probe.

All RO tables must follow the Lua model of requiring a Table header record (and in the case of RO tables the smaller 16 byte variant). However, RTOable headers can either

  • be located in RO flash address space and declared with the LROR_TABLE macros, or
  • be created at runtime and stored in the Lua registry, together with a referencing Lvalue.

In this second case, rather than using the registry reference scheme as described in [PiL 28.3], we follow the alternative convention used for single instance resources, which is to use the address of the luaR_entry vector as the registry key.

We need this approach to enable backwards compatibility for NodeMCU modules which still declare tables using the deprecated eLua declaration system. (It also enables the declaration of RO table with Lua updatable metatables.) Details TBC. An extra LValue table attribute used by the luaH_getshortstr() and luaH_getint() access routines to indirect from any LROVAL entries in luaR_entry to the corresponding Lvalue in the registry effectively hiding this deprecated use. This enables the Lua runtime to support unmodified NodeMCU 5.1 modules, albeit with some small RAM and runtime overhead.

The main functional difference between RO and RW tables is that all write access methods to RO tables will throw a Lua error, as RO tables have to be declared statically at compile time in the source code.

Excepting those with Lua registered headers, ROTables can only have a RO metatable, and attempting to do a setmetatable() in this case will also throw a Lua error. Of course RW tables and userdata can still use an RO metatable, and the eLua-added API call luaL_rometatable() API is still be available call so that userdata types can be bound to by a string name.

ROM_G Table

This is a small change but important enough to detail separately. A variant of the current NodeMCU 5.1 NODEMCU_MODULE() macro technique is used to allow individual luaR_entry declarations used for inline declaration of RO globals.

These are allocated in a dedicated linker section so lbaselib.c:luaopen_base() allocates a registry RO header for this global vector and assigns the globalROM_G and creates a single entry __index = ROM_G metatable for _G. Hence all entries in ROM_G are resolved as global using standard Lua inheritance rules.

Note that since the ROM_G table is visible as a Lua global and is enumerable by the pairs function: for k,v in pairs(ROM_G) do --....

The entries for core functions such as print and pairs are created by section and linker magic in this table, as are any global table definitions such as string and any user modules. And since ROM_G is just a standard ROTable, lookaside caching also works for these entries.

User modules can also statically declare global functions and values in the same way, so for example ltablib.c contains the following conditional static declaration to add the table unpack function as the global unpack (NLF is an acronym, Named Lightweight Function):

#if defined(LUA_COMPAT_UNPACK)
  LROR_GLOBAL_NLF(unpack, unpack)
#endif

The recommneded method of module initialisation is by static declaration. During startup, luaL_openlibs() scans ROM_G and each table in it (including bound metatables) for entries with the name of the format tablename__init which points to a function. The table name must be a valid table name and in the case of table and metatable entries must refer to the corresponding container table. If such an entry exists then the startup initialisation code will call the function to perform any module initialisation needed. Using ROM_G itself is deprecated for performance / bloat reasons, but it is currently used by the new version of the NODEMCU_MODULE() macro to allow existing modules to compile and run without modification.

Note that _G isn't scanned as this is build dynamically at runtime and any dynamic initialisation can be done programmatically.

The make / build process

Needs updating to include ESP32 IDF builds.

The NodeMCU build system uses "recursion magic", that is you do a top level make which has

  • a set of call-back variable assigns
  • a list of dependant submodules
  • and some local module action rules, which typically execute each subordinate make
  • and each subordinate make calls back its parent for its context.

So in the case of the lua core: the top level nodemcu-firmware make invokes the app make which invokes the lua make which references back the app make which which references back the nodemcu-firmware make. Hence all of the rules for $(GCC) etc. are defined only in the top level NodeMCU make.

This system has been slightly modified with NodeMCU 5.3 in that the app make calls the lua make with a host target and this target runs a more conventional makefile to build the host variants of lua and luac. This also uses the subdirectory app/lua/host for host-only modules and resources such as lrostring.c, loslib.c, generate_LROR.lua , etc. These host versions do not include the rest of the NodeMCU ecosystem.

Hence an host-executable version luac with the extra -X option is available for later scripts either to execute Lua unitilies as part of the build process or to use luac to convert Lua sources to target compatable lc files.

The host/lrostring.c is both generated by generate_LROR.lua and maintained it git, which might be seen as a catch-22, but in fact this version contains the minimal subset of RO string resources need to compile and build these host executables, and therefore is only rarely updated to reflect changes to the Lua core.

The NODEMCU macros also use predefined deines to generate flag variables in objects in the modules directory of the form XXXX_module_selected and only links those modules containing an externally linkable symbol named with this pattern.

The target RO string resources systen is piggy-backed onto this approach so a build-time scan of this list of modules and the lua core is used create and compile a build-specific rostring.o, so

  • this version is generated during the build and not stored in git.
  • it only includes LROR resources in the lua core and selected modules, so LROR resources in other (non-selected) modules are not included.
  • (under review) this ualso means that only the main <module>.c for selected modules is scanned. Where modules pull in other files in the modules folder, these won't be scanned and should not contain LROR resources unless the <module>.c the corresponding LROR_USE_STRING(name) declarations to create the resources.

Lua Modules and closures

Details still being finalised.

In taget builds, the rostring.c maps into two linker sections. The first is contains the RO TString resource definitions, and the second is fixed section which is at the end of the flash image. This fixed section is size both flash sector-aligned and of a configurable size through the ld definitions, which we call the LROR Partition (LRORP).

  • This segment is directly addressable in the ESP ICODE address space, enabling these resourses to be accessed directly by modules and the Lua runtime system (RTS).
  • However, because it is a fixed flash-sector-aligned area, it can also be rewritten on an occasional basis to update its contents, either by using the esptool.py though the UART and ROM-based firmware loader or under program control as part of a rebuild.

The LRORP contains:

  • A fixed header to reference contained resources
  • Optional additional RO Tstring declarations
  • Other constants such as integer and float LValues.
  • The current ROstst hash table
  • 0 or more Lua module hierarchies in loaded format.

This last item requires a further explanation. In Lua, a compiled module is loaded into a hierachy of resources in structures such as Proto definitions which are largely readonly, but collectable resources, that is can be scanned, marked, collected and discarded by the GC. However, LROR has already modified the GC to bypass RO resources.

So there is nothing in principle to prevent us loading such resources into a readonly partition such as the LRORP. However, update is complicated by coherency and integrity issues both at a hardware level (the ESPs ICODE hardware cache), and in terms of any overwrite of referenced resources corrupting the GC referencing system (overwriting a module that is currently referenced within the application could crrupt and crash the RTS in an indeterminate manner.

We therefore support a simple update model for the LRORP: a small NodeMCU-specific API is provides the ability to use a RAM-based Lua script to rebuild and reload a new LRORP. The LRORP reload is effectively an atomic operation which restarts the processor on completion. Nonetheless, this feature enable Lua application programmers to store their own or standard NodeMCU Lua modules in the LRORP with a simple loader script.

Once loaded, such modules can be referenced by using the standard require "module" syntax. This has a minimal and almost zero load overhead compared to loading from SPIFFS, and only the RW resources such as any globals and locals created by the executing modules take up RAM. The code and constants are exectued direclty from flash address space.

These feature requires the ROstrt to be moved to the LRORP, since loaded modules can add their own strings which much be referenced through the ROstrt.

Testing Strategy

The LROR functionality is first tested on a debug build using the host lua executable with an extended version of the Lua 5.3.4 test suite to hammer out most errors within a benign development environment. We are also investigating using large subset of this suite on a non-OS SDK environment with a minimal module set for testing on the ESP8266 architecture.

Performance

TBC, but our objective is to introduce the Lua 5.3 functionality in a reduced RAM footprint and increased runtime performance.

Writing LROR-compatible modules

Legacy Modules

Our objective is that existing module will work with minimal changes. (An example of such a minimal change would be the addition of an additional #include statement, and equivalent changes which could be done "en-mass" without an intimate knowledge of the module.

Nonetheless, the legaty eLua rotables method of declaration is deprecated and using this comes with some small RAM and performance overheads, so we would encourage all module maintainers to migrate to the new NodeMCU 5.3 module interface as soon as practical.

Existing modules should only require API changes in exceptional circumstances.

The aspect that does need further thought here in the impact of the split number types (integer and float), which is really a pure Lua 5.1 -> 5.3 migration issue. More research is needed.

Using Strings

In essence, high use string literals can be replaced by the corresponding LROR_WORD() or LROR_STRING() declaration. This replaces a compiled cost char * reference by the corresponding const TString * one. The second LROR_STRING() macro requires you to provide a symbolic name for the string as these ultimately generate external references that are resolved during link. The first is simply a syntactic sugar which uses the automatically generated name _TS_<word>. The common Lua API functions which accept a string argument now have TString equivalents, so for example

         lua_pushstring(L, "normal");

becomes

         lua_pushTString(L, LROR_WORD(normal));

Note that the Cstring version is still supported and often used (for example, all error path strings constants have been left in their Cstring form). However, the second form is usually adopted for main path code, and this pushes a (TString *) address directly onto the stack, (whereas the first calls a new string API call, which even in the case of an existing string involves recomputing the hash of the string and doing a strcmp()).

The RO string table is external to modules so any LROR word string reference will need a corresponding extern declaration to compile. The LROR_USE_STRING(name) and LROR_USE_STRINGn(name, ...) (where n=2..5) can be added to the source code to wrap these.

Internally as well as the global (RAM) string table, a second (RO) string table generated during the build process and stored in addressable flash. This RO table is used as a second level for resolution when any new string is resolved in the case of a miss against the RAM string table. In the case of a hit against the RO table, the address of the RO TString is returned, and therefore new entries are only created in the G(L)->strt for strings which aren't already in ROM.

Using Tables

The LROR patch introduces a specific method for writing modules in such a way that they fully utilise read-only resources. Note that this does not preclude the use of the standard Lua API to declare modules and to expose them at runtime; however, modules using the standard API will use RAM for all module resources.

Note: The LROR declaratives are different from the LTR API and therefore C modules written for LTR will require modification to use LROR

Anther limitation is that clearly RO tables can only refer to RO resources.

Consider a simple example where you want to register a simple module called "mod" that has a single function named "f". For standard Lua, you would code this as follows:

static const luaL_reg mod_map[] =
{
  { "f", f_implementation },
  { NULL, NULL }
};

LUALIB_API int luaopen_mod( lua_State *L )
{
  luaL_register( L, "mod", mod_map );
  other_initialisation();
  return 1;
}

For the LROR implementation, as well however, you'd need to define the same thing like this:

LROR_ENTRIES(mod) = {
  LROR_TABLE_ENTRY_NAME_LIGHTFUNC(f, f_implementation),
  LROR_TABLE_ENTRY_NAME_LIGHTFUNC(mod__init, other_initialisation)
};
LROR_GLOBAL_TABLE(mod, NULL)

A few points about the RO tables above:

  • The RO table entries are declared by a LROR_ENTRIES(mod) initialiser which includes a number of LROR_TABLE_ENTRY macros to initialise the entries.
  • The table entry macros have short and long form, so LROR_TABLE_ENTRY_NAME_LIGHTFUNC(name, func) declares a lightweight C function for the given name. The macros also have a short acronym form, in this example LROR_TENLF(name,func).
  • The Table structure itself is then declare using a LROR_TABLE(mod, metatable) macro. Note that this must follow the LROR_ENTRIES macro since it uses a sizeof() computation to calculate the number of table entries (which must be less than 255). Note that the metatable is set to NULL if the table does not have a metatable.
  • LROR_GLOBAL_TABLE(mod, metatable) is a variant of LROR_DEFINE_TABLE(mod, metatable) that also makes the table accessible from the global table ROM_G within Lua as discussed above.
  • The tables are by named with the static const attribute so can be referenced within the C module by their name using normal C scoping rules. This avoids the risk of name clash.
  • If you need to export the name to other C files (e.g. for a lua_pushrotable() call then you will need to export a get wrapper function.
  • In general the C API for table read access works as normal on RO tables.
  • Any of the C API calls for table access which attempts to update the table will result in an error being thrown.
  • At a Lua API level similar restrictions apply: the read only functions work as expected, but any attempt to write to a table will throw an error, including use of the functions insert, remove and sort.

Like any other table, RO tables can an associated metatable and metatables can be RO tables, but unlike the LTR patch, the metamethod __metatable is not overloaded with a different semantic.

Notes on Internals

Lookaside Cache

Lua 5.3 adds a lookaside cache to avoid interning of repeated string requests. It is a simple 53 × 2 slot cache. (These dimensions are configurable defined constants). Each new string request is hashed the two referenced TString entries are matched against the putative Cstring value. A hit short-circuits the relatively expensive hashing and string lookup process.

Accessing LROR string resources is by direct use of their TString references, and therefore doesn't use this string request path. On the other hand the O(n) ROTable access is now an issue, so a similar approach is used for ROTables, using this same cache table. This denormalisation is slightly a hack, but this is considered acceptable because of the need to avoid any additional RAM overheads.

The denormalisation exploits the fact that all TString and luaR_entry resources are word aligned.
So the bottom bit of the address is overloaded: 0 = TString entry; 1 = luaR_entry. In this second case the 8 MSB are luaR_entry index and the remaining LSbits (less bit 0) are matched against the Table pointer reference, allowing the index into the luaR_entry vector to be recovered in the case of a cache hit. A bit nasty perhaps, but a material performance boost.

String Handing and Comparison Avoidance

Lua 5.3 now subdivides strings into short and long variants, depending on a configurable threshold (currently 40 characters). Short strings are interned as with previous Lua versions, and are therefore unique, so two short TString * pointers refer to the same string if and only if the pointers refer to the same location. Hence no strcmp() comparisons are required for equality comparison.

Lua 5.3 also introduces a new two-way look-aside cache based on the address of the Cstring parameter in the function luaS_new(). This associates a TString * with the address of the C string variable used to create it. In the case of a hit the existing TString string and the new string are compared and if a match this is used to short-circuit hash calculation. In the case of a miss, the result of the resolved TString * is used to bump the cache entry. (In LROR, this cache is also used for ROM table entries.)

The new string functions check for an existing copy in the case of a short string and this match is on (i) the hash, (ii) the length and (iii) a direct comparison. This means that long strings are no longer automatically interned on the assumption that they are unlikely to be replicated. This avoids the size dependent hashing cost for longer strings, but in certain usecases it can result in a major memory increase, though in practice for IoT use the upside is a good performance boost and the risks of RAM growth minimal.

Also note that the new luaS_newliteral() is designed for C string constants and this calls luaS_newlstr() based on the sizeof the string; using this also bypasses the cache lookup.

Any strings declared with the LROR initialisers will be generated at build time and bypass all this complexity. It is unclear how this will effect the efficacy of the string cache, so I have added some internal instrumentation diagnostics to examine this.

Garbage Collection

Strings and tables are collectable objects. In standard Lua, all collectable objects are linked into one of three lists with heads in G(L) (fields fixedgc. finobj, allgc) using the common next field to link them. The GC ignores the first category normal collection, and only uses it during shutdown. RO objects can neither be GCed or collected at shutdown, so we've added an extra G(L) category: ROobj and all RO objects are linked into this (to facilitate diagnostic inspection of RO resources).

Format of RO Tables

Standard RW tables have a complex structure that has unnecessary storage overhead for small keyed tables. I considered the pros and cons of also using this structure for ROM tables and decided that my criteria should be to adopt a variant implementation:

  • if the overall savings in flash data space exceeded the extra code overhead of the variant, and
  • there were performance benefits in doing the ROM variant.

On this basis, this initial implementation includes a variant for handling simplified RO tables. However, some efforts have been made to minimise the code variation. Since much of the table handling code is to do with write access, storage allocation, resizing and garbage collection, none of which apply to RO tables, the amount of new code needed to support RO tables is quite modest.

The updated table structure retains a set of common fields:

  GCObject *next; 
  lu_byte tt; 
  lu_byte marked;
  lu_byte flags;
  lu_byte lsizenode;
  struct Table *metatable; 

and maintain two separate variants of the access field for the RW and RO variants. The RW variant retains the existing fields in an anonymous union to minimise code changes:

  GCObject *gclist;
  unsigned int sizearray; 
  TValue *array;
  Node *node;
  Node *lastfree;

The RO variant replaces these fields with

  luaR_entry *array;

It also overloads the lsizenode field with a sizeentries and uses a sizeof() calculation to generate this field, so scanning the entries is base on this size field rather than adding a dummy {LROR_NILKEY, LROR_NILKEY} stop entry.

The gclist field is only used in a variant aware part of lgc.c which is not called for ROM objects and the tt decodes which variant is used. I've hoisted the metatable reference into the common part because this metatable field is referenced a lot and keeping in the common area removes a class of variant coding.

Hence all allocated (that is RW) Table records are the size of the larger (RW) variant, but this isn't an issue since the RO forms are only created through the precompile and build process.

Note whilst ROtable entries can take any RO'able value, only short Cstring and short TString keys are currently supported. (Need to add more details on implementation.)

If we do later decide to allow settable metatables, then a reserved value for metatable for example (void *)-1 which would denote that the metatable is stored in the registry. In this case, the metatable association would be the one data element of a ROtable that is writeable. In order to achieve this the ROtables maintain this field within the Lua registry, and the metatable API would need to contain variant code to access and update this, but from a C metatable API viewpoint there is no functional difference between a RO and a RW table. However there would be a performance hit as all RO table accesses would need to query the registry.

LROR globals on the host builds

The linker magic used in our target builds depends on replacing the default linker script with a NodeMCU-customised one. At the moment the host implementation is a little bit of a botch, since I don't want to support the subtle variants of host linker scripts out there. This works on Linux / gcc build environment by exploiting a reserved section .rodata1 (which isn't otherwise used in Lua builds) and a known link order.

This works fine for development and testing, but it still a bit tacky. Need to think of a robust method of implementing this.

Loadlib implementation

The standard package table contains some standard subtables, and LROR move some of these into ROM Tables to minimize the RAM footprint:

  • package.preloaded. This lists package that are preloaded but not initialised and therefore must still be imported into a Lua application by a require. This is an empty ROM table in LROR.
  • package.loaded This is used by the standard require function to avoid duplicating reloads of dynamic modules. It must correctly resolve references such as package.loaded.string to avoid errors on valid Lua statements such as str=require "string". This is initialised by loadlib.c to and empty RAM table, but with an empty meta __index pointing to a search function which scans ROM_G for the corresponding table entry. Note that this means that indexing this table in Lua using pairs() will not enumerate the loaded modules. You have to scan ROM_G to do this.
  • package.searchers (previously loaders in Lua 5.1). This is a ROM table with a single entry for the standard Lua searcher. LROR drops the other three searcher types for preload, C and C root. However, since package is itself a RAM table there is nothing to stop Lua application programmers adding their own searchers, for example I use the following in my init.lua to support autoloading of modules over wifi:
  package.searchers = {load("net_autoloader.lua"), package.searchers[1]}

Syntax of LROR commands

*Caveat: This implementation is an interim (actually the third version) to demonstrate the feasibility of the LROR concept. Once we have a working framework to evaluate, it is anticipated that the language interface might be reworked.

Traditional Lua binds Lua resources at runtime through the Lua API which accepts standard C string, integer and other type arguments. This patch enables read-only variants of such resources to be declared inline in the source code.

  • In the case of string declarations, these C macros are preprocessed in the source as normal to generate the compiled inline code to refer to the external resources, so LROR_WORD(on) becomes &_TS_on where _TS_on the the external TString for the literal "on". There is also a Lua preprocessor which can scan the source base to regenerate rostring.c, the C module which statically declares all ROM-based TString tables.
  • Table resources map directly onto standard C macros, so are just compiled normally inline, and no additional Lua post-processing is needed. Whilst this approach might seem a kludge, something very similar was used for used by Database products such as Oracle to generate and embed SQL statement in C and other code.

A table definition first declares the luaR_entry vector using a LROR_ENTRIES() initialiser which includes a number of LROR_TABLE_ENTRY macros to initialise the entries.

  • LROR_ENTRIES(table) is a macro which generates the static const luaR_entry table[] statement which much be followed by a static initialiser which includes one or more table entries
  • LROR_TABLE_ENTRY(key, value) is a macro to initialise each key, value pair. There are a set of wrapper macros which hide the hassle of the luaR_key and Value declarations in both long and acronym form, for example LROR_TENLF(name,func) is the equivalent of name = func in a standard Lua initialiser.

The Table is then itself declared one of the three LROR_TABLE() variants:

  • LROR_TABLE(table_name, metatable_name). Defines a LROR table. Note that the table names are symbolic references used internally with the C build and not exposed by default with the Lua application name space. The table name is unquoted and must conform to the normal C name syntax. The second argument is NULL for tables without a metatable.
  • LROR_GLOBAL_TABLE(table_name, metatable_name). Variant of above which includes an entry for this table in the ROM_G table as discussed above, and hence the table name is exposed to the Lua application in the global name space.
  • LROR_METATABLE(table_name,flags). Defines a LROR metatable. This is different from the normal tables in that Lua 5.3 maintains a methods event flags bitmask in the header which enables the optimisation of __index, __newindex, __gc, __mode, __len and __eq meta methods. So if your metatable contains the index and newindex entries then set the flags field to 1u<<TM_INDEX | 1u<<TM_NEWINDEX.

Normal C scoping and declaration rules apply, so if a table has a meta table, it is easier to declare the metatable first.

TODO List

  • Macro variants of tt and _tt reference that generate L32 load instructions on the xtensa builds

  • Review loadlib model

  • Design walk-through of LROR resources and LGC to make sure that they are managed correctly.

  • String processor implementation doesn't support preprocessor macro expansion in LROR strings

  • Port Lua compact debug patch

  • Optional cut down version of math. Also check code for float constants.

  • Optional cut down version of debug

  • Add performance stats for string / table cache.

  • Consider how to implement LROR closures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment