Last active
June 9, 2023 17:06
-
-
Save KayEss/45a2e88675832228f57e2d598afc02ae to your computer and use it in GitHub Desktop.
Build me a LISP
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/// # Build me a LISP | |
// Read this nicely formatted at https://kirit.com/Build%20me%20a%20LISP | |
/** | |
Normally programming language blog posts get started with grammars and syntax | |
and parsing and all sorts of stuff like that. I can't be bothered with all of | |
that, so instead, let's start with the (maybe more interesting aspect) of the | |
actual evaluation of the code. Here we'll build an evaluation engine for LISP | |
like languages in less than 50 lines of C++ -- [and literate C++ at | |
that!](https://gist.github.com/KayEss/45a2e88675832228f57e2d598afc02ae) | |
Lisp uses something called a "cons cell" to store each part of both the program | |
and the data. Let's just say we have something called a cell for now, and we'll | |
come back to defining what it is later on. | |
*/ | |
struct cell; | |
/** | |
## S-Expressions | |
A Lisp program is actually just something called an s-expression. It's a fancy | |
term for a list of cells, but there is a special way to interpret these | |
s-expressions. We'll come back to that later on. | |
For now, we just need to know that s-expressions describe our LISP program, and | |
to run a LISP program we simply evaluate the s-expression. | |
*/ | |
#include <vector> | |
using sexpr = std::vector<cell>; | |
/** | |
If you read the [Wikipedia page on | |
s-expressions](https://en.wikipedia.org/wiki/S-expression) you'll see that | |
they're usually described in terms of a tree of cells. We're using a vector | |
instead, but the two are equivalent. | |
Walking down the right branch of the cell is the same as walking to the next | |
cell in the vector, and walking down the left branch of the cell is the same | |
as nesting an s-expression in a cell. | |
## Cells | |
Now we can talk about what a cell is. Our simple LISP will only have a single | |
data type, strings, so our cells will either contain a string or an | |
s-expression. | |
*/ | |
#include <string> | |
#include <variant> | |
struct cell { | |
std::variant<std::string, sexpr> content; | |
/** | |
We do need to be able to make cells, and they can be made from either | |
strings or s-expressions. | |
*/ | |
cell(std::string s) : content{std::move(s)} {} | |
cell(sexpr s) : content{std::move(s)} {} | |
}; | |
/** | |
## Evaluation | |
There are three different sorts of data structure that we have so far: | |
1. Strings | |
2. S-expressions | |
3. Cells | |
Because running our programs is going to be a matter of evaluating these | |
structures, we'll need to consider what "evaluation" means for each of these. | |
First of all, strings are just themselves. | |
*/ | |
std::string eval(std::string s) { return s; } | |
/** | |
The s-expressions are a bit more interesting, but for now we just have to | |
promise to implement them so that we can start with the simpler cells (and | |
because we have to implement the evaluation of the s-expressions in terms of | |
evaluating cells)... | |
*/ | |
std::string eval(sexpr const &); | |
/** | |
Evaluating a cell is pretty simple. It's either going to be a string or a | |
s-expression, so depending on what we find in the cell content we can just | |
forward to evaluating whichever one we find. | |
*/ | |
std::string eval(cell const &c) { | |
return std::visit([](auto const &c) { return eval(c); }, c.content); | |
} | |
/** | |
## The standard library | |
Although an s-expression is just a vector of cells, the first item in it is | |
special. To evaluate an s-expression we take the first item in it to be the name | |
of a function that is to be executed, and the remaining s-expression cells are | |
then taken as that function's arguments (some of which may be s-expressions that | |
require evaluation themselves). | |
So if you're used to writing your programs like this: | |
```python | |
foo(bar, baz) | |
``` | |
The s-expression form would look more like: | |
```lisp | |
(foo bar baz) | |
``` | |
We're going to implement our library in C++, and we're going to look functions | |
up by name. Because the s-expression can have any number of arguments, it makes | |
sense to evaluate them based on a couple of iterators (`marker`s), one to the | |
first argument and one to the end of the s-expression. | |
*/ | |
using marker = sexpr::const_iterator; | |
/** | |
Our built in library functions take the two iterators representing their | |
arguments and return a string which is the result of running them. | |
*/ | |
#include <functional> | |
using builtin = std::function<auto(marker, marker)->std::string>; | |
/** | |
We now need a simple data structure that will hold these for us and allow | |
look-ups by name. | |
*/ | |
#include <map> | |
std::map<std::string, builtin> library; | |
/** | |
## Evaluating S-Expressions | |
The first item in an s-expression is a function name that we're going to call. | |
This means that an empty s-expression is an error. | |
*/ | |
#include <stdexcept> | |
std::string eval(sexpr const &s) { | |
if (s.begin() == s.end()) { | |
throw std::domain_error("Empty sexpr"); | |
/** | |
An alternative design would be to say that an empty s-expression | |
evaluates to an empty string, but this is good for us. | |
Now, assuming our s-expression contains a first item, we need to fetch | |
the function name. Remember that our s-expression is comprised of cells | |
(which can contain strings *or* s-expressions) so we need to turn our | |
cell into a string. Luckily we already have a function that does that: | |
`eval`. | |
*/ | |
} else { | |
auto const fname = eval(*s.begin()); | |
/** | |
Using `eval` here (rather than just fetching the string out of the cell | |
directly) has some important consequences to what this language can do. | |
Although we've said so far that the first item in an s-expression has to | |
be the function name, by evaluating the cell to fetch that string we can | |
actually have an s-expression as the first cell! And evaluating that | |
s-expression gives the function name that we will run. Don't worry about | |
this for now, there will be an example at the end. | |
Now that we have a function name we only need to look it up and call it, | |
and of course take into account the possibility that it might not be | |
found. | |
*/ | |
if (auto const fp = library.find(fname); fp != library.end()) { | |
return fp->second(++s.begin(), s.end()); | |
} else { | |
throw std::invalid_argument{fname + " not in the library"}; | |
} | |
} | |
} | |
/** | |
And that's actually it for the whole evaluation engine. Luckily we have some | |
lines left over to try it out with. | |
## Running a program | |
*/ | |
#include <iostream> | |
int main() { | |
try { | |
/** | |
As with all programming languages we have to make sure to say hello to | |
the whole world. | |
```lisp | |
(cat "Hello" "world!") | |
``` | |
Because we didn't bother with a parser we have to write the data | |
structure for the program directly into the C++ source code. Thankfully | |
it's not too hard. | |
*/ | |
using namespace std::string_literals; | |
auto const prog = sexpr{"cat"s, "Hello "s, "world!"s}; | |
/** | |
Of course we didn't put anything into our library yet, so let's make | |
sure that we do at least have an implementation of `cat` we can call. | |
*/ | |
library["cat"] = [](marker pos, marker end) { | |
std::string ret; | |
while (pos != end) ret += eval(*pos++); | |
return ret; | |
}; | |
/** | |
Now we can run our program and show the result. | |
*/ | |
std::cout << eval(prog) << std::endl; | |
/** | |
[If you grab this C++ file you can compile and run it using something | |
like (the file is available as a Gist on | |
Github)](https://gist.github.com/KayEss/45a2e88675832228f57e2d598afc02ae): | |
```bash | |
clang++ lisp.cpp -std=c++17 -o lisp | |
./lisp | |
``` | |
And you'll see some output: | |
Hello world! | |
## Errors... | |
The very last thing we should do is to print out some sort of error | |
message if a run-time error came up. | |
*/ | |
} catch (std::exception &e) { | |
std::cerr << "Error: " << e.what() << std::endl; | |
return 1; | |
} | |
} | |
/** | |
## More fun | |
Remember there was the bit about the evaluation of the lead term in the | |
s-expression? Because the first term can itself be an s-expression you can write | |
a program to generate the function name you want to call. | |
If you do try this on your machine, try to change the program to the following | |
and see what happens. | |
```cpp | |
auto const prog = sexpr{ | |
sexpr{"cat"s, "c"s, "a"s, "t"s}, | |
"Hello "s, "world!"s}; | |
``` | |
When you run this it will still print "Hello world!" because the s-expression as | |
the first term gets evaluated and produces the name `cat` which then gets called | |
to produce the output. | |
### Order fun | |
Remember the type we had for `builtin` took `marker`s to the s-expressions of | |
the arguments? This means that the argument values are evaluated inside of the | |
library function definition. Look again at the loop in our `cat` function: | |
```cpp | |
while (pos != end) ret += eval(*pos++); | |
``` | |
It would also be a reasonable choice to move that evaluation outside of the | |
library function and instead have our `builtin` type look like this (a function | |
that takes a vector of strings and produces a string): | |
```cpp | |
using builtin = std::function<auto(std::vector<std::string>)->std::string>; | |
``` | |
This difference in where the expressions are evaluated is called [*Evaluation | |
Strategy*](https://en.wikipedia.org/wiki/Evaluation_strategy) and the choice we | |
made above is called *normal-order*. This is of course the opposite of what | |
you're used to from pretty much every other programming language, which normally | |
use *applicative-order*. | |
More commonly, normal-order is referred to as [*lazy | |
evaluation*](https://en.wikipedia.org/wiki/Lazy_evaluation) and | |
applicative-order as [*eager | |
evaluation*](https://en.wikipedia.org/wiki/Eager_evaluation). | |
### `if` isn't (an eager) function | |
In LISP terms our `builtin`s are all [*special | |
forms*](http://www.lispworks.com/documentation/HyperSpec/Body/03_ababa.htm). | |
This is important in order to be able to correctly implement *forms* like `if`. | |
Let's say we want to implement `if` so that it can be used something like this: | |
```lisp | |
(if (condition) (true-case) (false-case)) -> string | |
``` | |
We will always need to evaluate the `condition` expression, and, depending on | |
what happens, we are for sure going to return either the `true-case` or the | |
`false-case`. But more importantly we only want to **evaluate** one of them. | |
If we have to implement `if` in a language that only has *eager evaluation* then | |
we will evaluate both the `true-case` and the `false-case` even if we only | |
return the result of one of them. | |
But with *lazy evaluation* we can save ourselves some processing, and this | |
control of the evaluation that we give to the library functions is critical if | |
you want to take this toy language and turn it into something real. | |
C++ itself is an eager language and this is why we cannot short-circuit in | |
user-defined implementations of `&&` and `||`, something that can cause us some | |
pain. [But there is | |
hope](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0927r2.pdf). | |
*/ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment