Blog 2019/9/7
<- previous | index | next ->
Let's learn how to implement a Lisp or Scheme-like language by writing a series of increasingly capable interpreters.
- previous: Step 2: Recursive descent parsing
The core of any Lisp-like interpreter is the eval
function.
As a concept, eval
is over 60 years old,
first seen in McCarthy's
original Lisp implementation.
Paul Graham
wrote a great article[ps][pdf][markdown]
about the significance of eval
.
eval
takes an AST and an environment and returns a value.
However, Python already has an eval
function, so we will use the name eval2
. Start with a stub:
eval.py
def eval2(ast, env):
return None
Lisp-like languages follow the convention that a list represents a function call, where the first item in the list is a function (the operator) and the rest of the list items are the arguments to that function. E.g. (sum 1 2 3)
evaluates to 6
.
More particularly, sum
is a symbol. Symbols evaluate to their corresponding value from the environment. The environment is like a lookup table which maps symbols to values. In this case, the value of sum
is a function pointer.
1
, 2
and 3
are literals, which evaluate to themselves.
In a traditional Lisp-like language implementation, eval
determines the value of the operand and arguments, then hands them to a function named apply
, which executes the function.
(again, Python already has an apply
, so we will use the name apply2
.)
But we are getting ahead of ourselves. First, let's support evaluating literals:
def is_int(ast):
return ast[0] == "INT"
def eval2(ast, env):
if is_int(ast):
return int(ast[1])
else:
raise Exception("Don't know how to evaluate %s" % ast)
Try it out:
$ python
Python 2.7.16 (default, Mar 4 2019, 09:02:22)
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.11.45.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from eval import *
>>> env = {}
>>> ast = ("INT", "42")
>>> eval2(ast, env)
42
Cool, let's skip to the punchline and add a __main__
implementation so that we can run this bad boy:
if __name__ == "__main__":
import sys
from lex import load_tokendefs, tokenize
from parse import parse
if len(sys.argv) > 1:
text = open(sys.argv[1]).read()
else:
text = sys.stdin.read()
tdefs = load_tokendefs("tokendefs.txt")
env = {}
print eval2(parse(tokenize(tdefs, text)), env)
$ echo -n 1 | python eval.py
1
Woot!
Next, let's create a function sum2
(Python already defines a sum
function):
import operator
def sum2(*args):
return reduce(operator.add, args)
Try it out:
$ python
Python 2.7.16 (default, Mar 4 2019, 09:02:22)
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.11.45.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from eval import *
>>> sum2(1, 2, 3)
6
Next, crack open tokendefs.txt
and add a SYMBOL
token type. We'll start off with something very basic: a symbol is a group of lowercase letters:
SYMBOL
[a-z]+
Let's check that it works with our lexer:
$ echo 'sum' | python lex.py
[('SYMBOL', 'sum'), ('WSPACE', '\n')]
Next, we need to update our parser to handle symbols. We'll make a slight tweak to our grammar:
SEXPR = ATOM | LIST
ATOM = INT | SYMBOL
LIST = OPAREN CPAREN
| OPAREN SEXPR { WSPACE SEXPR } CPAREN
We'll need to add another terminal parser for SYMBOL
:
def parse_symbol(tokens):
return parse_terminal(tokens, "SYMBOL")
We need to create a parse_atom
function for the rule ATOM = INT | SYMBOL
. This uses alternation, so its implementation will look very similar to parse_sexpr
:
def parse_atom(tokens):
(ast, remaining) = parse_int(tokens)
if ast is not None:
return (ast, remaining)
else:
return parse_symbol(tokens)
And speaking of parse_sexpr
, we need to update that as well:
def parse_sexpr(tokens):
(ast, remaining) = parse_atom(tokens)
if ast is not None:
return (ast, remaining)
else:
return parse_list(tokens)
Now let's try it out:
$ echo -n '(sum 1 2 3)' | python parse.py
('LIST', [('SYMBOL', 'sum'), ('INT', '1'), ('INT', '2'), ('INT', '3')])
Fantastico!
Now, we'll need to update eval2
to handle symbol lookup:
def is_symbol(ast):
return ast[0] == "SYMBOL"
def lookup(ast, env):
symbol_name = ast[1]
return env[symbol_name]
def eval2(ast, env):
if is_int(ast):
return int(ast[1])
elif is_symbol(ast):
return lookup(ast, env)
else:
raise Exception("Don't know how to evaluate %s" % ast)
In the eval.py
__main__
implementation, replace env = {}
with:
env = { "sum": sum2 }
Now let's see if we can eval
a symbol:
$ echo -n 'sum' | python eval.py
<function sum2 at 0x102b40398>
Rock and roll! We are really close now.
Now we just need to add an elif is_list()
clause to eval2
:
def is_list(ast):
return ast[0] == "LIST"
def eval2(ast, env):
if is_int(ast):
return int(ast[1])
elif is_symbol(ast):
return lookup(ast, env)
elif is_list(ast):
children = ast[1]
first, rest = children[0], children[1:]
operand = eval2(first, env)
args = [eval2(arg, env) for arg in rest]
return apply2(operand, args)
else:
raise Exception("Don't know how to evaluate %s" % ast)
def apply2(op, args):
return op(*args)
Now let's eval (sum 1 2 3)
:
$ echo -n '(sum 1 2 3)' | python eval.py
6
KABLAMMO MOTHERFUCKER!!! You just wrote an interpreter!
Next time, we'll look at defining new entries in the environment.