TerryE/LROR Paper.md

## LROR Paper.md

      
    Raw
  

              LROR Paper.md
            
          
    LROR (Lua Read-Only Resources) in NodeMCU Lua 5.3

Drafting Caveat

Updates since last realease:

Extra details on the multi-host make system
Extra details on the legacy module support and how this can be used to compile and run modules
written for the Lua 5.1 version with no or minimal source changes.
Slight rework of ROTable implementation to support the same.
Explanation of the LROR Lua module support and on module rebuilding of Flash based Lua modules.

This port build upon two previous demonstrators:

a core post of the additional NodeMCU functionality into a host-only demonstrator platform, and
a back-port of this functionality into the NodeMCU 5.1.4 core to evaluate the issues of
integration into the rest of the NodeMCU ecosystem.

Neither of these demonstrators was released for third-party evaluation as they had served their
purpose for the author: to enable the development of the final 5.3 code-base.  This white paper
focuses solely on this final 5.3 port and discards any historic detail that is no longer relevant.
Background

Node MCU Lua is currently based on the eLua fork of Lua 5.1.4. ("NodeMCU 5.1"). This version
discards many aspects of the added eLua functionality, and hence these compounded modifications make
the Lua core code-base difficult to maintain. However more recent Lua versions now include a
back-port of two of the most important eLua features, as well as some very desirable performance
feature. We therefore plan to rebaseline NodeMCU on the current stable Lua version (5.3.4). My focus
here is to document this Lua 5.3 port ("NodeMCU 5.3") in the form of a white paper. This set of
changes to the standard Lua sources is known as LROR (Lua Read-Only Resources) and implements the
following objectives:

It makes the new Lua 5.2 and 5.3 features available to NodeMCU applications;
It integrates the standard NodeMCU platform support;
It provides an easy migration path for existing modules to move from the 5.1 to the 5.3 code-base.
It adds a C API support for a subset of native Lua types enabling a range of constant resources
to be declared and initialised in the .text or other read-only segments.

This has LROR approach has three major benefits:


It further decreases the RAM usage of the Lua runtime environment and Lua scripts.


Since these read-only resources are truly static and available to the executing scripts, this
avoids the allocation and garbage collection overheads of creating such resources in RAM and
subsequently removing them.


The GCC toolchain for the ESP generates byte-aligned string and byte constants, but the ESP's
xtensa architecture can only access 32-bit word aligned data resources from the flash-based
firmware. NodeMCU uses a software base exception handler to handle and process such unaligned
accesses, albeit at a runtime execution hit. Hence accessing C strings in the flash-based .text
segment invariably creates the per-byte overheads for the software exception handler, and this in
turn incurs a material runtime overhead. The new code for LROR allocates strings on word-aligned
boundaries; only accesses the string values when necessary and since the lengths are known it uses
memcmp / memcpy functions which correctly handle non-aligned comparison.


The current ESP8266 NodeMCU version was heavily modified both by the eLua changes (specifically
including Bogdan Marinescu's LTR patch,
and the need to support the non-OS SDK, plus further performance optimisation, to the extent that is
is becoming very difficult to maintain.
Lua 5.3.4 contains a back-port of two of the key eLua enhancements, lightweight C functions, and the
Emergency Garbage Collector (EGC), both of which are required by nodeMCU. It also includes full
support for a default 32-bit data type, together with separate integer and floating point data
types.
NodeMCU 5.3 also closes the remaining functional gap with Node MCU 5.1 by reimplementing additional
support for ROM-base resources, whilst providing a smooth upgrade path for existing NodeMCU 5.1
modules, and mitigating (effectively removing) the performance impacts of the LTR rotables
implementation and the unaligned string exception handling overhead.  Some effort has been made to
achieve these goals with the minimum changes to the core Lua code base.
As NodeMCU 5.3 will effectively unify the previous integer and floating point build variants,
separate Integer and Floating point build variant are therefore no longer supported. In the case
of the ESP32 which has H/W support for 32-bit floating point, full H/W support for all Lua numeric
data types.
Main changes implemented by the LROR patch

_Note that NodeMCU macros require C99 language support, so the LROR patch changes do not attempt to
preserve compatibility with C89 or other C standard variants.  An example here is that LROR changes
sometimes split variable declarations with inserted executable statements to minimise line changes.
Lua 5.3 already introduces the concept of variant data types, and in particular splitting:

Numbers into separate integer and floating point sub-types.
Strings into short and long variants, with only short strings being interned.
Functions into current and lightweight variants.

LROR builds on this variant type functionality by allowing the declaration of read-only (RO)
variants of


Short (interned) strings.  In standard Lua 5.3, interned strings are currently in a RAM hash
table, that can be accessed in C using the Lua API. LROR adds a parallel set of RO structures which
implement a second string hash table in RO Flash memory. The internal Lua VM lookup code for any
interned strings first resolves against the ROstrt table before using the RAM-based strt Hence
those strings stored in the RO tables do not need to be allocated (or freed) in RAM. This save on
both RAM resources and the malloc/free overheads. As interned strings are unique, string comparison
can be achieved by address comparison of the interned resource.


Lua Values and new Key Value type.  This implementation follows the eLua pattern, except for
extra types. The previous eLua STRING type was used to refer to null-terminated const char *
byte strings, and is now referred to as CSTRING, with STRING now being used for the native Lua
string format.


New RO and RW subtypes for Tables.  The RW subtype is essentially the same implementation as
existing Lua tables.  The RO subtype is new, and is different to the eLua approach.  In standard Lua
each table has a table definition structure which then points to a hash table which stores the
individual entries. RW tables preserve this approach. The RO table header only contains a subset of
the fields relevant to RO tables, and instead of a hash, it points to a vector of key/value pairs.

Keeping a common table header for both variants means that the RO vs RW table handling is
fully encapsulated from the bulk of the table handling code, and is only exposed to the low level
access functions.
Using the vector form for RO table entries is largely the same as for the eLua LTR patch and
simplifies inline declaration of RO table resources. However, the encapsulation of RO tables means
that a look-aside cache (a feature which has been implemented elsewhere in Lua 5.3) can be used to
replace the O(n) entry search by a direct access.  On the Lua test environment this achieves
over a 95% cache-hit rate, and this effectively eliminates the current runtime overheads of using
ROM tables.


Ability for store and reload Lua Modules into FLash.  A (configurable) fixed flash area,
known the the LROR Partition (or LRORP) is set aside for the support of Flash-based Lua modules,
RO strings and the RO string table.  Whilst the Lua build-process can prepopulate this, an on-module
Lua API is also provided to enable Lua developers to rebuld the LRORP on module, that is without
needing a firmware build environ ment.  If Lua application developers mode the bulk of their Lua
application code into LRORP modules, then this effectively more than doubles the ESP RAM available
to Lua developers.


The Lua C API has also been extended for common API functions which easy the task of writing
new NOdeMCU library modules including API functions which would sensibly benefit from (ROM based)
TString * arguments as an alternative to C string variants.  Note that this is an API extension
rather than a substitution, so that existing C extension libraries /modules can still be compiled
and work with no (or in a few case minor) changes at a source level.


Supplementary design goals and details

Multiple platform support

Whilst standard Lua is designed to compile against a rich set of operating system and applications
environments, the previous NodeMCU implementation was only designed to support the ESP8266, though
later updates have added beta support for the ESP32. This patch supports the Lua 5.3 environment
three target platforms using a common code base:

The ESP8266 (and derivative ESP8385) architectures using the Espressif non-OS SDK and its GCC
xtensa toolshain.
The ESP32 architecture using the Espressif IDF and its GCC xtensa toolchain.
Linux host architecture using the standard Linux GCC toolchain.  Whilst this will work for many
Linux variants, we are specifically testing on our Travis-CI hosting platform.

The first two versions are supported by their respective Espressif / NodeMCU environments. The last
environments is for three specific purposes: (i) supporting Lua core development; (ii) testing
against the standard and NodeMCU-extended Lua test suites; (iii) building a host runnable luac
image which can be used to cross-complie Lua source, and to run limited Lua executables using the
additional -X execute option. (The LROR preprocessor is bootstrapped and uses this feature so that
NodeMCU build environments don't need to depend on other host Lua installations.)  The host Lua
versions are always built as part of an ESP build and hence the luac executable is always
available on the host for initialising LROR partitions and SPIFFS file systems.
A corollary to this is that the LROR changes only support these platforms. Unlike the Lua and eLua
code based which had complex usecases (e.g. to support WinX and OS X, and big and little endian
variants), The only conditional LROR code is when it is needed to support these variants. The LROR
changes are all or nothing, and are tested as a bundle. There is no out of the box option to
cherry-pick bits of LROR functionality and omit others.
Minimise changes to the core Lua source code

We are taking a conscious step away from the rapid development approach adopted in our first NodeMCU
implementation, and so our basic guideline is that the source will only be changed when there is a
material benefit or unavoidable need for doing so, and either such changes will follow a standard
pattern or fulfil a specific functional need. All such changes are document herein. Example patterns
include:


Replacing "some_word"  C strings by their equivalent LROR_WORD(some_word) declarations.
Such C strings replacement is optional and is only done where there might be performance and RAM
usage benefits for doing: so quoted words are candidates for replacing if:

The word is likely to be used as a constant in a Lua application.  Including it in the string
table will avoid creating the equivalent RAM G(L)->strt entry.
The word is already used as a TString elsewhere so is already a RO resource.
It will be referenced repeatedly during execution.  Examples include C string constants used
in common API calls.  C strings used in initialisation routines or error paths aren't usually
converted.


A limited number of common multi-word strings are replaced by their equivalent
LROR_STRING(name,"some string").


Changes to #include statements in header preambles. In some cases the three variant targets
(POSIX, newlib and ESP non-OS SDK) require different header files to compile and build. On the
ESP8266 non-OS builds, the Espressif-supplied C header must be used instead because the ESP platform
only supports an extremely limited subset of the standard library functions or has a slightly
non-stadnard API. and we would prefer to pick up misuse during compile / build. Our general approach
is to avoid #ifdef conditional logic by using a but of header magic so that where such variants
are required a simple global substitution of the macro C_HEADER_XXX for <xxx.h> instead.


Use of additional TS variants of Lua API functions.  An example here is complementing the
standard lua_pushstring() API function with an equivalent lua_pushTString() which takes a
TString object as a parameter. Note that these are extensions to the API and not substitutions.
This is to ensure that modules using the standard documented Lua API continue to build and work.


Other modules have been added or modified when sensible (for example the additional TS variants of
API calls and some internal static routines that have been converted from C string to TString
parameters where there is a runtime benefit in doing this.
The other major changes are the implementation of RO Tables and the use NodeMCU 5.1 model of "linker
magic" ) to simplify the declaration of ROM global modules, tables and global functions.
RO Strings

Interned strings are maintained in RAM hash table G(L)->strt. LROR adds a parallel statically
allocated set of RO structures which are accessed through a second G(L)->ROstrt. The interning
algorthim ensures that such strings are unique and only stored once and so the address of an
interned string is a unique descriptor for the string for the purposes of copying, assignment and
comparison. The lookup code for any new interned strings first resolves against the ROstrt table
before the RAM-based strt. Hence those RO strings do not need to be allocated (or freed) in RAM,
saviing on both RAM resources and the malloc/free overheads.
Unfortunately, current C compiler technology does not offer the necessary compile-time functionality
to implement Lua string types fully. A new LROR C source preprocessor has been added into the build
process and this serves a dual purpose:
-  The LROR macros are mined and parsed at a source level to generate a C source file which
initialise these data structures.
-  The macros also used during source compilation as C preprocessor define macros to declare the
mirror extern statements that enables all the source to be compiled validly whist resolving these
RO references through the linker step of the build.
So for example the inline usage of the RO string resource for the word on:

This is declared by including LROR_USE_STRING(on) macro in the source.  This generates the
extern TString * _LROR_on declaration.(This macro doesn't support varags, but shorthand versions
LROR_USE_STRINGn(...) exist for n = 2 to 5 simplify decaring multiple word strings.)
References to the string are created by including LROR_WORD(on) in the source.
As these are standard C preprocessor macros, these are subject to normal conditional compilation
rules. The generated extern statements can then subsequently be compiled and linked into the
target image.
However, the source is also scanned by the Lua preprocessor and in turn this generates the C file
to declare and initialise the necessary TString for on and any toher used TString variable,
together with any hash table data structures. Note that this preprocess currently does not use the
standard C preprocessor output, so cannot exploit C macros and conditional compile statements.
Any Lua references to the string on will also be resolved to the same ROM TString constant,
thus avoiding the need to create a new RAM-based TString or its subsequent garbage collection.
Whilst this preprocessor approach isn't ideal, it hasn't proved an issue in practice.

As ROM based constant are truly immutable, some adjustments have been made to exclude these from
the scope of the garbage collector, so that it only marks and scans RAM-based resources.
RO Tables

The LROR tables comprise two components:

a ROtable header record which is a cut-down variant of the current Table structure. The bulk of
the Lua runtime code code-base treats Table records as an encapsulated resource (the main
exceptions are in the low level handling in ltable.c and lgc.c). By unifying these RW and RO
forms, the RO handling is therefore largely hidden at a code level, minimising the changes needed to
support RO tables.
an RO entry (luaR_entry) vector, that is backwards compatible with the luaR_entry encoding
used in NodeMCU 5.1, however TString keys and values are now supported as well as the ability to
reference secondary tables using ROTable as well as luaR_entry referenceds. Using a simple
vector format also simplifies the declaration of LROR tables as these can now be simply declared
inline using the standard C supporting LROR macros.

Whereas RW tables are hashed with an access time O(1), accessing RO table entry lists are
O(N) and this is doubly bad news with flash access times. I therefore extended the look aside
cache that Lua 5.3 has added (to accelerate C string to interned string conversion) to accelerate RO
table entry accesses. See Lookaside Cache below for details. This means that
repeated RO based table entries such as ROM_G.pairs are typically accessed in a single ROM probe.
All RO tables must follow the Lua model of requiring a Table header record (and in the case
of RO tables the smaller 16 byte variant). However, RTOable headers can either

be located in RO flash address space and declared with the LROR_TABLE macros, or
be created at runtime and stored in the Lua registry, together with a referencing Lvalue.

In this second case, rather than using the registry reference scheme as described in [PiL 28.3],
we follow the alternative convention used for single instance resources, which is to use the address
of the luaR_entry vector as the registry key.
We need this approach to enable backwards compatibility for NodeMCU modules which still declare
tables using the deprecated eLua declaration system. (It also enables the declaration of RO table
with Lua updatable metatables.) Details TBC. An extra LValue table attribute used by the
luaH_getshortstr() and luaH_getint() access routines to indirect from any LROVAL entries in
luaR_entry to the corresponding Lvalue in the registry effectively hiding this deprecated use.
This enables the Lua runtime to support unmodified NodeMCU 5.1 modules, albeit with some small RAM
and runtime overhead.
The main functional difference between RO and RW tables is that all write access methods to RO
tables will throw a Lua error, as RO tables have to be declared statically at compile time in the
source code.
Excepting those with Lua registered headers, ROTables can only have a RO metatable, and attempting
to do a setmetatable() in this case will also throw a Lua error.  Of course RW tables and userdata
can still use an RO metatable, and the eLua-added API call luaL_rometatable() API is still be
available call so that userdata types can be bound to by a string name.
ROM_G Table

This is a small change but important enough to detail separately. A variant of the current NodeMCU
5.1 NODEMCU_MODULE() macro technique is used to allow individual luaR_entry declarations used
for inline declaration of RO globals.
These are allocated in a dedicated linker section so lbaselib.c:luaopen_base() allocates a
registry RO header for this global vector and assigns the globalROM_G and creates a single entry
__index = ROM_G metatable for _G. Hence all entries in ROM_G are resolved as global using
standard Lua inheritance rules.
Note that since the ROM_G table is visible as a Lua global and is enumerable by the pairs
function: for k,v in pairs(ROM_G) do --....
The entries for core functions such as print and pairs are created by section and linker magic
in this table, as are any global table definitions such as string and any user modules.  And since
ROM_G is just a standard ROTable, lookaside caching also works for these entries.
User modules can also statically declare global functions and values in the same way, so for
example ltablib.c contains the following conditional static declaration to add the table unpack
function as the global unpack (NLF is an acronym, Named Lightweight Function):
#if defined(LUA_COMPAT_UNPACK)
  LROR_GLOBAL_NLF(unpack, unpack)
#endif
The recommneded method of module initialisation is by static declaration. During startup,
luaL_openlibs() scans ROM_G and each table in it (including bound metatables) for entries with
the name of the format tablename__init which points to a function. The table name must be a valid
table name and in the case of table and metatable entries must refer to the corresponding container
table. If such an entry exists then the startup initialisation code will call the function to
perform any module initialisation needed. Using ROM_G itself is deprecated for performance / bloat
reasons, but it is currently used by the new version of the NODEMCU_MODULE() macro to allow
existing modules to compile and run without modification.
Note that _G isn't scanned as this is build dynamically at runtime and any dynamic initialisation
can be done programmatically.
The make / build process

Needs updating to include ESP32 IDF builds.
The NodeMCU build system uses "recursion magic", that is you do a top level make which has

a set of call-back variable assigns
a list of dependant submodules
and some local module action rules, which typically execute each subordinate make
and each subordinate make calls back its parent for its context.

So in the case of the lua core: the top level nodemcu-firmware make invokes the app make which
invokes the lua make which references back the app make which which references back the
nodemcu-firmware make.  Hence all of the rules for $(GCC) etc. are defined only in the top level
NodeMCU make.
This system has been slightly modified with NodeMCU 5.3 in that the app make calls the lua make
with a host target and this target runs a more conventional makefile to build the host variants of
lua and luac. This also uses the subdirectory app/lua/host for host-only modules and resources
such as lrostring.c, loslib.c, generate_LROR.lua , etc. These host versions do not include the
rest of the NodeMCU ecosystem.
Hence an host-executable version luac with the extra -X option is available for later scripts
either to execute Lua unitilies as part of the build process or to use luac to convert Lua
sources to target compatable lc files.
The host/lrostring.c is both generated by generate_LROR.lua and maintained it git, which might
be seen as a catch-22, but in fact this version contains the minimal subset of RO string resources
need to compile and build these host executables, and therefore is only rarely updated to reflect
changes to the Lua core.
The NODEMCU macros also use predefined deines to generate flag variables in objects in the modules
directory of the form XXXX_module_selected and only links those modules containing an externally
linkable symbol named with this pattern.
The target RO string resources systen is piggy-backed onto this approach so a build-time scan of
this list of modules and the lua core is used create and compile a build-specific rostring.o, so

this version is generated during the build and not stored in git.
it only includes LROR resources in the lua core and selected modules, so LROR resources in other
(non-selected) modules are not included.
(under review) this ualso means that only the main <module>.c for selected modules is
scanned.  Where modules pull in other files in the modules folder, these won't be scanned and
should not contain LROR resources unless the  <module>.c the corresponding LROR_USE_STRING(name)
declarations to create the resources.

Lua Modules and closures

Details still being finalised.
In taget builds, the rostring.c maps into two linker sections.  The first is contains the RO
TString resource definitions, and the second is fixed section which is at the end of the flash
image.  This fixed section is size both flash sector-aligned and of a configurable size through the
ld definitions, which we call the LROR Partition (LRORP).

This segment is directly addressable in the ESP ICODE address space, enabling these resourses to
be accessed directly by modules and the Lua runtime system (RTS).
However, because it is a fixed flash-sector-aligned area, it can also be rewritten on an
occasional basis to update its contents, either by using the esptool.py though the UART and
ROM-based firmware loader or under program control as part of a rebuild.

The LRORP contains:

A fixed header to reference contained resources
Optional additional RO Tstring declarations
Other constants such as integer and float LValues.
The current ROstst hash table
0 or more Lua module hierarchies in loaded format.

This last item requires a further explanation.  In Lua, a compiled module is loaded into a hierachy
of resources in structures such as Proto definitions which are largely readonly, but
collectable resources, that is can be scanned, marked, collected and discarded by the GC.  However,
LROR has already modified the GC to bypass RO resources.
So there is nothing in principle to prevent us loading such resources into a readonly partition such
as the LRORP.  However, update is complicated by coherency and integrity issues both at a hardware
level (the ESPs ICODE hardware cache), and in terms of any overwrite of referenced resources
corrupting the GC referencing system (overwriting a module that is currently referenced within the
application could crrupt and crash the RTS in an indeterminate manner.
We therefore support a simple update model for the LRORP: a small NodeMCU-specific API is provides
the ability to use a RAM-based Lua script to rebuild and reload a new LRORP. The LRORP reload is
effectively an atomic operation which restarts the processor on completion.  Nonetheless, this
feature enable Lua application programmers to store their own or standard NodeMCU Lua modules in
the LRORP with a simple loader script.
Once loaded, such modules can be referenced by using the standard require "module" syntax.  This
has a minimal and almost zero load overhead compared to loading from SPIFFS, and only the RW
resources such as any globals and locals created by the executing modules take up RAM.  The code
and constants are exectued direclty from flash address space.
These feature requires the ROstrt to be moved to the LRORP, since loaded modules can add their own
strings which much be referenced through the ROstrt.
Testing Strategy

The LROR functionality is first tested on a debug build using the host lua executable with an
extended version of the Lua 5.3.4 test suite to hammer out most
errors within a benign development environment.  We are also investigating using large subset of
this suite on a non-OS SDK environment with a minimal module set for testing on the ESP8266
architecture.
Performance

TBC, but our objective is to introduce the Lua 5.3 functionality in a reduced RAM footprint and
increased runtime performance.
Writing LROR-compatible modules

Legacy Modules

Our objective is that existing module will work with minimal changes.  (An example of such a minimal
change would be the addition of an additional #include statement, and equivalent changes which
could be done "en-mass" without an intimate knowledge of the module.
Nonetheless, the legaty eLua rotables method of declaration is deprecated and using this comes
with some small RAM and performance overheads, so we would encourage all module maintainers to
migrate to the new NodeMCU 5.3 module interface as soon as practical.
Existing modules should only require API changes in exceptional circumstances.
The aspect that does need further thought here in the impact of the split number types (integer and
float), which is really a pure Lua 5.1 -> 5.3 migration issue.  More research is needed.
Using Strings

In essence, high use string literals can be replaced by the corresponding LROR_WORD() or
LROR_STRING() declaration. This replaces a compiled cost char * reference by the corresponding
const TString * one. The second LROR_STRING() macro requires you to provide a symbolic name for
the string as these ultimately generate external references that are resolved during link. The first
is simply a syntactic sugar which uses the automatically generated name _TS_<word>. The common
Lua API functions which accept a string argument now have TString equivalents, so for example
         lua_pushstring(L, "normal");
becomes
         lua_pushTString(L, LROR_WORD(normal));
Note that the Cstring version is still supported and often used (for example, all error path strings
constants have been left in their Cstring form). However, the second form is usually adopted for
main path code, and this pushes a (TString *) address directly onto the stack, (whereas the first
calls a new string API call, which even in the case of an existing string involves recomputing the
hash of the string and doing a strcmp()).
The RO string table is external to modules so any LROR word string reference will need a
corresponding extern declaration to compile.  The LROR_USE_STRING(name) and
LROR_USE_STRINGn(name, ...) (where n=2..5) can be added to the source code to wrap these.
Internally as well as the global (RAM) string table, a second (RO) string table generated during the
build process and stored in addressable flash.  This RO table is used as a second level for
resolution when any new string is resolved in the case of a miss against the RAM string table.  In
the case of a hit against the RO table, the address of the RO TString is returned, and therefore
new entries are only created in the G(L)->strt for strings which aren't already in ROM.
Using Tables

The LROR patch introduces a specific method for writing modules in such a way that they fully
utilise read-only resources.  Note that this does not preclude the use of the standard Lua API to
declare modules and to expose them at runtime; however, modules using the standard API will use
RAM for all module resources.
Note: The LROR declaratives are different from the LTR API and therefore C modules written for
LTR will require modification to use LROR
Anther limitation is that clearly RO tables can only refer to RO resources.
Consider a simple example where you want to register a simple module called "mod" that has a single
function named "f". For standard Lua, you would code this as follows:
static const luaL_reg mod_map[] =
{
  { "f", f_implementation },
  { NULL, NULL }
};

LUALIB_API int luaopen_mod( lua_State *L )
{
  luaL_register( L, "mod", mod_map );
  other_initialisation();
  return 1;
}
For the LROR implementation, as well  however, you'd need to define the same thing like this:
LROR_ENTRIES(mod) = {
  LROR_TABLE_ENTRY_NAME_LIGHTFUNC(f, f_implementation),
  LROR_TABLE_ENTRY_NAME_LIGHTFUNC(mod__init, other_initialisation)
};
LROR_GLOBAL_TABLE(mod, NULL)
A few points about the RO tables above:

The RO table entries are declared by a LROR_ENTRIES(mod) initialiser which includes a number of
LROR_TABLE_ENTRY macros to initialise the entries.
The table entry macros have short and long form, so LROR_TABLE_ENTRY_NAME_LIGHTFUNC(name, func)
declares a lightweight C function for the given name.  The macros also have a short acronym form, in
this example LROR_TENLF(name,func).
The Table structure itself is then declare using a LROR_TABLE(mod, metatable) macro.  Note
that this must follow the LROR_ENTRIES macro since it uses a sizeof() computation to calculate
the number of table entries (which must be less than 255). Note that the metatable is set to
NULL if the table does not have a metatable.
LROR_GLOBAL_TABLE(mod, metatable) is a variant of LROR_DEFINE_TABLE(mod, metatable)
that also makes the table accessible from the global table ROM_G within Lua as discussed above.
The tables are by named with the static const attribute so can be referenced within the C
module by their name using normal C scoping rules. This avoids the risk of name clash.
If you need to export the name to other C files (e.g. for a lua_pushrotable() call then you
will need to export a get wrapper function.
In general the C API for table read access works as normal on RO tables.
Any of the C API calls for table access which attempts to update the table will result in an
error being thrown.
At a Lua API level similar restrictions apply: the read only functions work as expected, but any
attempt to write to a table will throw an error, including use of the functions insert, remove
and sort.

Like any other table, RO tables can an associated metatable and metatables can be RO tables, but
unlike the LTR patch, the metamethod __metatable is not overloaded with a different semantic.
Notes on Internals

Lookaside Cache

Lua 5.3 adds a lookaside cache to avoid interning of repeated string requests. It is a simple 53 × 2
slot cache. (These dimensions are configurable defined constants). Each new string request is hashed
the two referenced TString entries are matched against the putative Cstring value. A hit
short-circuits the relatively expensive hashing and string lookup process.
Accessing LROR string resources is by direct use of their TString references, and therefore doesn't
use this string request path. On the other hand the O(n) ROTable access is now an issue, so a
similar approach is used for ROTables, using this same cache table. This denormalisation is slightly
a hack, but this is considered acceptable because of the need to avoid any additional RAM overheads.
The denormalisation exploits the fact that all TString and luaR_entry resources are word aligned.

So the bottom bit of the address is overloaded: 0 = TString entry; 1 = luaR_entry.  In this
second case the 8 MSB are luaR_entry index and the remaining LSbits (less bit 0) are matched
against the Table pointer reference, allowing the index into the luaR_entry vector to be recovered
in the case of a cache hit.  A bit nasty perhaps, but a material performance boost.
String Handing and Comparison Avoidance

Lua 5.3 now subdivides strings into short and long variants, depending on a configurable threshold
(currently 40 characters). Short strings are interned as with previous Lua versions, and are
therefore unique, so two short TString * pointers refer to the same string if and only if the
pointers refer to the same location. Hence no strcmp() comparisons are required for equality
comparison.
Lua 5.3 also introduces a new two-way look-aside cache based on the address of the Cstring parameter
in the function luaS_new(). This associates a TString * with the address of the C string
variable used to create it. In the case of a hit the existing TString string and the new string are
compared and if a match this is used to short-circuit hash calculation. In the case of a miss, the
result of the resolved TString * is used to bump the cache entry. (In LROR, this cache is also
used for ROM table entries.)
The new string functions check for an existing copy in the case of a short string and this match
is on (i) the hash, (ii) the length and (iii) a direct comparison. This means that long strings are
no longer automatically interned on the assumption that they are unlikely to be replicated. This
avoids the size dependent hashing cost for longer strings, but in certain usecases it can result
in a major memory increase, though in practice for IoT use the upside is a good performance boost
and the risks of RAM growth minimal.
Also note that the new luaS_newliteral() is designed for C string constants and this calls
luaS_newlstr() based on the sizeof the string; using this also bypasses the cache lookup.
Any strings declared with the LROR initialisers will be generated at build time and bypass all this
complexity. It is unclear how this will effect the efficacy of the string cache, so I have added
some internal instrumentation diagnostics to examine this.
Garbage Collection

Strings and tables are collectable objects. In standard Lua, all collectable objects are linked into
one of three lists with heads in G(L) (fields fixedgc. finobj, allgc) using the common
next field to link them. The GC ignores the first category normal collection, and only uses it
during shutdown. RO objects can neither be GCed or collected at shutdown, so we've added an extra
G(L) category: ROobj and all RO objects are linked into this (to facilitate diagnostic inspection of
RO resources).
Format of RO Tables

Standard RW tables have a complex structure that has unnecessary storage overhead for small keyed
tables. I considered the pros and cons of also using this structure for ROM tables and decided that
my criteria should be to adopt a variant implementation:

if the overall savings in flash data space exceeded the extra code overhead of the variant, and
there were performance benefits in doing the ROM variant.

On this basis, this initial implementation includes a variant for handling simplified RO tables.
However, some efforts have been made to minimise the code variation.  Since much of the table
handling code is to do with write access, storage allocation, resizing and garbage collection, none
of which apply to RO tables, the amount of new code needed to support RO tables is quite modest.
The updated table structure retains a set of common fields:
  GCObject *next; 
  lu_byte tt; 
  lu_byte marked;
  lu_byte flags;
  lu_byte lsizenode;
  struct Table *metatable; 
and maintain two separate variants of the access field for the RW and RO variants.  The RW variant
retains the existing fields in an anonymous union to minimise code changes:
  GCObject *gclist;
  unsigned int sizearray; 
  TValue *array;
  Node *node;
  Node *lastfree;
The RO variant replaces these fields with
  luaR_entry *array;
It also overloads the lsizenode field with a sizeentries and uses a sizeof() calculation to
generate this field, so scanning the entries is base on this size field rather than adding a
dummy {LROR_NILKEY, LROR_NILKEY} stop entry.
The gclist field is only used in a variant aware part of lgc.c which is not called for ROM
objects and the tt decodes which variant is used.  I've hoisted the metatable reference into the
common part because this metatable field is referenced a lot and keeping in the common area removes
a class of variant coding.
Hence all allocated (that is RW) Table records are the size of the larger (RW) variant, but
this isn't an issue since the RO forms are only created through the precompile and build process.
Note whilst ROtable entries can take any RO'able value, only short Cstring and short TString keys
are currently supported. (Need to add more details on implementation.)
If we do later decide to allow settable metatables, then a reserved value for metatable for
example (void *)-1 which would denote that the metatable is stored in the registry. In this case,
the metatable association would be the one data element of a ROtable that is writeable.  In order to
achieve this the ROtables maintain this field within the Lua registry, and the metatable API would
need to contain variant code to access and update this, but from a C metatable API viewpoint there
is no functional difference between a RO and a RW table.  However there would be a performance hit
as all RO table accesses would need to query the registry.
LROR globals on the host builds

The linker magic used in our target builds depends on replacing the default linker script with a
NodeMCU-customised one.  At the moment the host implementation is a little bit of a botch, since
I don't want to support the subtle variants of host linker scripts out there.  This works on Linux /
gcc build environment by exploiting a reserved section .rodata1 (which isn't otherwise used in
Lua builds) and a known link order.
This works fine for development and testing, but it still a bit tacky.  Need to think of a robust
method of implementing this.
Loadlib implementation

The standard package table contains some standard subtables, and LROR move some of these into ROM
Tables to minimize the RAM footprint:

package.preloaded. This lists package that are preloaded but not initialised and therefore must
still be imported into a Lua application by a require.  This is an empty ROM table in LROR.
package.loaded  This is used by the standard require function to avoid duplicating reloads of
dynamic modules.  It must correctly resolve references such as package.loaded.string to avoid
errors on valid Lua statements such as str=require "string". This is initialised by loadlib.c to
and empty RAM table, but with an empty meta __index pointing to a search function which scans
ROM_G for the corresponding table entry. Note that this means that indexing this table in Lua
using pairs() will not enumerate the loaded modules.  You have to scan ROM_G to do this.
package.searchers (previously loaders in Lua 5.1).  This is a ROM table with a single entry
for the standard Lua searcher.  LROR drops the other three searcher types for preload, C and C root.
However, since package is itself a RAM table there is nothing to stop Lua application programmers
adding their own searchers, for example I use the following in my init.lua to support autoloading
of modules over wifi:

  package.searchers = {load("net_autoloader.lua"), package.searchers[1]}
Syntax of LROR commands

*Caveat: This implementation is an interim (actually the third version) to demonstrate the
feasibility of the LROR concept. Once we have a working framework to evaluate, it is anticipated
that the language interface might be reworked.
Traditional Lua binds Lua resources at runtime through the Lua API which accepts standard C string,
integer and other type arguments. This patch enables read-only variants of such resources to be
declared inline in the source code.

In the case of string declarations, these C macros are preprocessed in the source as normal to
generate the compiled inline code to refer to the external resources, so LROR_WORD(on) becomes
&_TS_on where _TS_on the the external TString for the literal "on". There is also a Lua
preprocessor which can scan the source base to regenerate rostring.c, the C module which
statically declares all ROM-based TString tables.
Table resources map directly onto standard C macros, so are just compiled normally inline, and
no additional Lua post-processing is needed. Whilst this approach might seem a kludge, something
very similar was used for used by Database products such as Oracle to generate and embed SQL
statement in C and other code.

A table definition first declares the luaR_entry vector using a LROR_ENTRIES() initialiser which
includes a number of LROR_TABLE_ENTRY macros to initialise the entries.

LROR_ENTRIES(table) is a macro which generates the static const luaR_entry table[]
statement which much be followed by a static initialiser which includes one or more table entries
LROR_TABLE_ENTRY(key, value) is a macro to initialise each key, value pair.  There are a
set of wrapper macros which hide the hassle of the luaR_key and Value declarations in both long
and acronym form, for example LROR_TENLF(name,func) is the equivalent of name = func in a
standard Lua initialiser.

The Table is then itself declared one of the three LROR_TABLE() variants:

LROR_TABLE(table_name, metatable_name). Defines a LROR table. Note that the table
names are symbolic references used internally with the C build and not exposed by default with the
Lua application name space. The table name is unquoted and must conform to the normal C name syntax.
The second argument is NULL for tables without a metatable.
LROR_GLOBAL_TABLE(table_name, metatable_name). Variant of above which includes an
entry for this table in the ROM_G table as discussed above, and hence the table name is exposed
to the Lua application in the global name space.
LROR_METATABLE(table_name,flags).  Defines a LROR metatable.  This is different from
the normal tables in that Lua 5.3 maintains a methods event flags bitmask in the header which
enables the optimisation of __index, __newindex, __gc, __mode, __len and __eq
meta methods. So if your metatable contains the index and newindex entries then set the flags field
to 1u<<TM_INDEX | 1u<<TM_NEWINDEX.

Normal C scoping and declaration rules apply, so if a table has a meta table, it is easier to
declare the metatable first.
TODO List


Macro variants of tt and _tt reference that generate L32 load instructions on the xtensa
builds


Review loadlib model


Design walk-through of LROR resources and LGC to make sure that they are managed correctly.


String processor implementation doesn't support preprocessor macro expansion in LROR strings


Port Lua compact debug patch


Optional cut down version of math.  Also check code for float constants.


Optional cut down version of debug


Add performance stats for string / table cache.


Consider how to implement LROR closures.