Skip to content

Instantly share code, notes, and snippets.

@tbrunz
Last active March 8, 2023 16:58
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save tbrunz/1b5e28d3e571c021aa0a440b173e1bfb to your computer and use it in GitHub Desktop.
Save tbrunz/1b5e28d3e571c021aa0a440b173e1bfb to your computer and use it in GitHub Desktop.
A simple Lua app to demonstrate the principles of lexical closures
#! /usr/bin/env lua
--
-- A simple Lua app to demonstrate the principles of lexical closures.
--
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
--
-- An iterator that sorts a table by its keys, in alphabetical order.
--
function pairsByKeys ( alphaTable, sortFunction )
local alphaArray = { }
--[[
The provided table, 'alphaTable', is a Lua table, which means it's an
associative array. AA's store their key-value pairs using hash codes
to improve look-up efficiency. One trade-off in using this method is
that key-value retrieval (using the 'pairs' iterator) will appear in
more or less a "random" order, not alphabetical order.
For this application, we want a loop iterator that *will* successively
return the key-value pairs in alphabetic order by key. To do this, we
will recast 'alphaTable' to make a new Lua 'sequence' (i.e., an ordered
array), where the values of the array, 'alphaArray', are the keys of
the 'alphaTable' table we were given.
Why do this? Because while the lua function 'table.sort' sorts arrays,
it only sorts by their values; it can't sort keys. Ergo, we need to
define an array whose values are the (unsorted) keys of 'alphaTable';
we can then sort this array to get the sorted keys we need.
]]--
for key, _ in pairs( alphaTable ) do
alphaArray[ #alphaArray + 1 ] = key
end
--[[
Now we sort the table keys, using this auxiliary array. The
'table.sort' function sorts arrays using a provided comparison
function. But note that the default sorting function is '<', which
works for alphabetical sorting. So if our iterator was called with
no second argument, we'll get an alpha sort, as expected. And if
it's called with a function as a second argument, the iterator will
return key-value pairs using that alternative sorting scheme instead.
Note that 'table.sort' will "sort in place", so no need to define a
third array.
]]--
table.sort( alphaArray, sortFunction )
--[[
Now that the keys are sorted, we can provide a function that will
iterate over the *sorted* order of the keys, returning them & their
associated values in the original table. The iterator is a closure,
a function that "closes over" three variables that are in the lexical
scope of the returned function defined here (but are not actually
defined in the function itself):
1) The table argument that this function was called with,
2) The sorted array used to control the order of retrieval,
3) An index to keep track of what to return on successive calls.
On each call of the iterator we will pre-increment the index to make
detecting the end condition easy; therefore we must start the index
at zero. (Recall that arrays in Lua are 1-based, not 0-based.)
]]--
local index = 0
--[[
Now we return a nameless function (a 'lambda'). All functions in Lua
are actually lambdas, which can be assigned to variables, passed as
parameters, and returned from functions.
]]--
return function ( )
-- When we've exhausted the table/array, a pre-incremented index
-- will cause the table/array access to return 'nil', which also
-- signals the client using this iterator that its loop is done.
index = index + 1
-- The keys are in alphabetic order in 'alphaArray', while the
-- values are in a hashed order in 'alphaTable'. Note that Lua
-- is register-based, so the following construct is byte-compiled
-- into an efficient executable.
return alphaArray[index], alphaTable[ alphaArray[index] ]
end
end
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
--
-- An iterator that traverses an input file, returning its words, one by one.
--
function allWords ( filename )
-- Open a file of the given name for reading, so we can count its words.
-- Note that 'io.input' will raise an error for us, should one occur.
io.input( filename )
--[[
Prime the while loop in the returned function below by reading the first
line from the file, and have it start searching for words at position 1
(since strings/arrays are 1-based). Note that the default argument for
'io.read' is "*l", or "read line". The default file that 'io.read' reads
from is set with an earlier call to 'io.input'.
]]--
local line = io.read()
local pos = 1
--[[
What Lua (and other languages, such as Java) call an "iterator" are
really "generators". That is, a "generator" function, such as this,
creates and returns an "iterator" function. Clients will call the
iterator function, typically in a 'for' loop.
Iterators are typically closures; in this case, our iterator closes
over the variables 'line' & 'pos', but not the file descriptor and
its buffer, which are globals. The variables are defined above in
*this* function (which is called only once, to obtain the iterator).
None of these variables are defined in the iterator function (closure)
being returned (although they could be), yet they are used by the
iterator each time it gets called.
This is possible because the above variables exist in the closure's
lexical scope at define-time (even though they will have already gone
out of scope at run-time after this function returns). The value of
a language having closures is that these definitions are preserved
inside the closure at run-time and are encapsulated ('hidden' from
scopes where the function is actually called).
So the variables used in the closure are not globals, but they are
also not local to the closure; hence, they are "non-local variables".
All functions in Lua are actually closures.
]]--
return function ( )
-- Enable looping as long as a line is read from the file. Once
-- the end of file is reached, 'io.read' will return 'nil', which
-- will prevent the 'while' statement from executing.
while line do
-- Locals 'first' & 'last' delimit the position of each word
-- found in the current line of the file. By 'word', we mean
-- a contiguous string of alphanumeric characters, a pattern
-- matched by "%w+". ("%W+" would match strings of non-word
-- characters.) The closure variable 'pos' tells Lua where
-- in the string to start matching word characters.
-- Note that Lua supports multiple return values (very nice)!
-- Also note that we treat 'line' (a string) as an object,
-- applying the find' method to it by using the ':' operator.
local first, last = line:find( "%w+", pos )
-- Was a word found in this 'line' at or after position 'pos'?
-- If so, 'first' and 'last' will be returned as integers.
-- If not, they will be 'nil' (which equates to boolean false).
if first then
-- Advance the position pointer to the next character after
-- the word, then extract the word string and return it.
pos = last + 1
-- The string function 'sub' extracts a sub-string.
return line:sub( first, last )
else
-- A word was not found in (the remainder of) the line; so
-- read another line & reset the position to the beginning
-- of this new line. Then loop to try again. If there are
-- no more lines in the file, 'line' will be 'nil', and
-- this 'while' loop will terminate. By using a 'while'
-- loop, we will skip over multiple blank lines correctly.
line = io.read()
pos = 1
end
end
-- On reaching this point, there is nothing left in the file; so
-- return 'nil' to signal that the loop has completed. Note that
-- this line isn't strictly needed, as a 'return' in Lua without
-- an argument will return 'nil' anyway.
return nil
end
end
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
--
-- Test the 'allWords' iterator (generator) with a simple routine to
-- count all occurrences of each word in a text file.
--
--[[
We start by getting a file to scan -- let's use ourselves! To do that,
we need the Lua equivalent of '${0}' in bash or '__file__' in Python.
Lua doesn't have a global variable for this, though, so we need to call a
function, 'debug.getinfo', and extract a field that holds the filename.
Our call to 'debug.getinfo' below returns the name of the source file ('S')
of the function at the first level ('1') of the call stack. Note that the
filename will begin with an '@' symbol, indicating it is a filename. By
passing '1', we are asking for information about the current function. If
we were to pass in '0', it would return "=[C]", since it would be returning
information about the 'getinfo' function itself (which is defined in an
external C library). More information on this function can be found in the
Debug Library chapter in the "Programming in Lua" book...
]]--
local thisModule = debug.getinfo( 1, 'S' )
-- Don't forget to snip off the '@' sigil from the obtained filename!
-- Passing only one argument to 'sub' causes the second argument to
-- default to 'the last character in the string', which allows us to
-- use 'sub' to easily discard the first 'n' characters of a string.
-- In this case, the string is the 'source' field of 'thisModule'.
local targetFile = thisModule.source:sub( 2 )
-- Now collect all the words in the target file as a 'bag of words'.
-- Creating a 'bag' requires us to count the number of occurrences of
-- each key, where the keys are simply the words we find in the file.
WordBag = { }
-- To do this, we use the Lua iterator we defined above to retrieve the
-- words from the target file, one by one, dropping each into our bag of
-- words as we go.
for word in allWords( targetFile ) do
-- We don't care about capitalization, so make the word all lowercase.
word = word:lower()
-- If the word hasn't been seen before, create a key with a value of 1.
-- Otherwise, increment the value (count) for keys we've seen before:
if not WordBag[ word ] then
WordBag[ word ] = 1
else
WordBag[ word ] = WordBag[ word ] + 1
end
end
-- Now that we've read through the entire file, our bag of words is full.
-- Create a report, sent to 'stdout', formatted as "<word> <count>". The
-- report format is controlled by the function 'string.format', where the
-- format codes are exactly the same as the 'printf' format codes in the
-- C language.
outputFormat = "%20s %s"
-- And here is the report generation code, starting with a header.
print()
print( string.format( outputFormat, "Word", "Frequency" ) )
print( string.format( outputFormat,
" ----------------", "----------------" ) )
--[[
Now we want to extract each word (key) and its count (value) from our bag of
words. However, the order of the words in the bag (i.e., the order we would
pull them out, if we use Lua's 'pairs' iterator) is essentially random. We
need to interate over the bag contents, sorted by its keys in alphabetical
order, printing each key & value. To do this, we use the custom iterator we
defined at the top of this module to return the bag contents in the desired
order (using the default sort function to obtain an alphabetical order).
]]--
for word, count in pairsByKeys( WordBag ) do
print( string.format( outputFormat, word, count ) )
end
print()
-- Note that this report can easily be modified to display the words in the
-- order of the frequency with which they appear in the target file.
-- (This is left as an exercise for the student... ;^)
-- Finally, we close the file. (Not strictly needed, as the OS will do this.)
-- Note that 'io.input' when called with no arguments, will return the handle
-- of the current input file.
io.input():close()
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment