Skip to content

Instantly share code, notes, and snippets.

@KayEss
Last active June 9, 2023 17:06
Show Gist options
  • Save KayEss/45a2e88675832228f57e2d598afc02ae to your computer and use it in GitHub Desktop.
Save KayEss/45a2e88675832228f57e2d598afc02ae to your computer and use it in GitHub Desktop.
Build me a LISP
/// # Build me a LISP
// Read this nicely formatted at https://kirit.com/Build%20me%20a%20LISP
/**
Normally programming language blog posts get started with grammars and syntax
and parsing and all sorts of stuff like that. I can't be bothered with all of
that, so instead, let's start with the (maybe more interesting aspect) of the
actual evaluation of the code. Here we'll build an evaluation engine for LISP
like languages in less than 50 lines of C++ -- [and literate C++ at
that!](https://gist.github.com/KayEss/45a2e88675832228f57e2d598afc02ae)
Lisp uses something called a "cons cell" to store each part of both the program
and the data. Let's just say we have something called a cell for now, and we'll
come back to defining what it is later on.
*/
struct cell;
/**
## S-Expressions
A Lisp program is actually just something called an s-expression. It's a fancy
term for a list of cells, but there is a special way to interpret these
s-expressions. We'll come back to that later on.
For now, we just need to know that s-expressions describe our LISP program, and
to run a LISP program we simply evaluate the s-expression.
*/
#include <vector>
using sexpr = std::vector<cell>;
/**
If you read the [Wikipedia page on
s-expressions](https://en.wikipedia.org/wiki/S-expression) you'll see that
they're usually described in terms of a tree of cells. We're using a vector
instead, but the two are equivalent.
Walking down the right branch of the cell is the same as walking to the next
cell in the vector, and walking down the left branch of the cell is the same
as nesting an s-expression in a cell.
## Cells
Now we can talk about what a cell is. Our simple LISP will only have a single
data type, strings, so our cells will either contain a string or an
s-expression.
*/
#include <string>
#include <variant>
struct cell {
std::variant<std::string, sexpr> content;
/**
We do need to be able to make cells, and they can be made from either
strings or s-expressions.
*/
cell(std::string s) : content{std::move(s)} {}
cell(sexpr s) : content{std::move(s)} {}
};
/**
## Evaluation
There are three different sorts of data structure that we have so far:
1. Strings
2. S-expressions
3. Cells
Because running our programs is going to be a matter of evaluating these
structures, we'll need to consider what "evaluation" means for each of these.
First of all, strings are just themselves.
*/
std::string eval(std::string s) { return s; }
/**
The s-expressions are a bit more interesting, but for now we just have to
promise to implement them so that we can start with the simpler cells (and
because we have to implement the evaluation of the s-expressions in terms of
evaluating cells)...
*/
std::string eval(sexpr const &);
/**
Evaluating a cell is pretty simple. It's either going to be a string or a
s-expression, so depending on what we find in the cell content we can just
forward to evaluating whichever one we find.
*/
std::string eval(cell const &c) {
return std::visit([](auto const &c) { return eval(c); }, c.content);
}
/**
## The standard library
Although an s-expression is just a vector of cells, the first item in it is
special. To evaluate an s-expression we take the first item in it to be the name
of a function that is to be executed, and the remaining s-expression cells are
then taken as that function's arguments (some of which may be s-expressions that
require evaluation themselves).
So if you're used to writing your programs like this:
```python
foo(bar, baz)
```
The s-expression form would look more like:
```lisp
(foo bar baz)
```
We're going to implement our library in C++, and we're going to look functions
up by name. Because the s-expression can have any number of arguments, it makes
sense to evaluate them based on a couple of iterators (`marker`s), one to the
first argument and one to the end of the s-expression.
*/
using marker = sexpr::const_iterator;
/**
Our built in library functions take the two iterators representing their
arguments and return a string which is the result of running them.
*/
#include <functional>
using builtin = std::function<auto(marker, marker)->std::string>;
/**
We now need a simple data structure that will hold these for us and allow
look-ups by name.
*/
#include <map>
std::map<std::string, builtin> library;
/**
## Evaluating S-Expressions
The first item in an s-expression is a function name that we're going to call.
This means that an empty s-expression is an error.
*/
#include <stdexcept>
std::string eval(sexpr const &s) {
if (s.begin() == s.end()) {
throw std::domain_error("Empty sexpr");
/**
An alternative design would be to say that an empty s-expression
evaluates to an empty string, but this is good for us.
Now, assuming our s-expression contains a first item, we need to fetch
the function name. Remember that our s-expression is comprised of cells
(which can contain strings *or* s-expressions) so we need to turn our
cell into a string. Luckily we already have a function that does that:
`eval`.
*/
} else {
auto const fname = eval(*s.begin());
/**
Using `eval` here (rather than just fetching the string out of the cell
directly) has some important consequences to what this language can do.
Although we've said so far that the first item in an s-expression has to
be the function name, by evaluating the cell to fetch that string we can
actually have an s-expression as the first cell! And evaluating that
s-expression gives the function name that we will run. Don't worry about
this for now, there will be an example at the end.
Now that we have a function name we only need to look it up and call it,
and of course take into account the possibility that it might not be
found.
*/
if (auto const fp = library.find(fname); fp != library.end()) {
return fp->second(++s.begin(), s.end());
} else {
throw std::invalid_argument{fname + " not in the library"};
}
}
}
/**
And that's actually it for the whole evaluation engine. Luckily we have some
lines left over to try it out with.
## Running a program
*/
#include <iostream>
int main() {
try {
/**
As with all programming languages we have to make sure to say hello to
the whole world.
```lisp
(cat "Hello" "world!")
```
Because we didn't bother with a parser we have to write the data
structure for the program directly into the C++ source code. Thankfully
it's not too hard.
*/
using namespace std::string_literals;
auto const prog = sexpr{"cat"s, "Hello "s, "world!"s};
/**
Of course we didn't put anything into our library yet, so let's make
sure that we do at least have an implementation of `cat` we can call.
*/
library["cat"] = [](marker pos, marker end) {
std::string ret;
while (pos != end) ret += eval(*pos++);
return ret;
};
/**
Now we can run our program and show the result.
*/
std::cout << eval(prog) << std::endl;
/**
[If you grab this C++ file you can compile and run it using something
like (the file is available as a Gist on
Github)](https://gist.github.com/KayEss/45a2e88675832228f57e2d598afc02ae):
```bash
clang++ lisp.cpp -std=c++17 -o lisp
./lisp
```
And you'll see some output:
Hello world!
## Errors...
The very last thing we should do is to print out some sort of error
message if a run-time error came up.
*/
} catch (std::exception &e) {
std::cerr << "Error: " << e.what() << std::endl;
return 1;
}
}
/**
## More fun
Remember there was the bit about the evaluation of the lead term in the
s-expression? Because the first term can itself be an s-expression you can write
a program to generate the function name you want to call.
If you do try this on your machine, try to change the program to the following
and see what happens.
```cpp
auto const prog = sexpr{
sexpr{"cat"s, "c"s, "a"s, "t"s},
"Hello "s, "world!"s};
```
When you run this it will still print "Hello world!" because the s-expression as
the first term gets evaluated and produces the name `cat` which then gets called
to produce the output.
### Order fun
Remember the type we had for `builtin` took `marker`s to the s-expressions of
the arguments? This means that the argument values are evaluated inside of the
library function definition. Look again at the loop in our `cat` function:
```cpp
while (pos != end) ret += eval(*pos++);
```
It would also be a reasonable choice to move that evaluation outside of the
library function and instead have our `builtin` type look like this (a function
that takes a vector of strings and produces a string):
```cpp
using builtin = std::function<auto(std::vector<std::string>)->std::string>;
```
This difference in where the expressions are evaluated is called [*Evaluation
Strategy*](https://en.wikipedia.org/wiki/Evaluation_strategy) and the choice we
made above is called *normal-order*. This is of course the opposite of what
you're used to from pretty much every other programming language, which normally
use *applicative-order*.
More commonly, normal-order is referred to as [*lazy
evaluation*](https://en.wikipedia.org/wiki/Lazy_evaluation) and
applicative-order as [*eager
evaluation*](https://en.wikipedia.org/wiki/Eager_evaluation).
### `if` isn't (an eager) function
In LISP terms our `builtin`s are all [*special
forms*](http://www.lispworks.com/documentation/HyperSpec/Body/03_ababa.htm).
This is important in order to be able to correctly implement *forms* like `if`.
Let's say we want to implement `if` so that it can be used something like this:
```lisp
(if (condition) (true-case) (false-case)) -> string
```
We will always need to evaluate the `condition` expression, and, depending on
what happens, we are for sure going to return either the `true-case` or the
`false-case`. But more importantly we only want to **evaluate** one of them.
If we have to implement `if` in a language that only has *eager evaluation* then
we will evaluate both the `true-case` and the `false-case` even if we only
return the result of one of them.
But with *lazy evaluation* we can save ourselves some processing, and this
control of the evaluation that we give to the library functions is critical if
you want to take this toy language and turn it into something real.
C++ itself is an eager language and this is why we cannot short-circuit in
user-defined implementations of `&&` and `||`, something that can cause us some
pain. [But there is
hope](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0927r2.pdf).
*/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment