Hello Friends,
This elf begging to climb onto the web for Christmas began as a personal email, a review of Paul Graham's little Lisp Bel. He sprouted arms, legs, and in gingerstyle ran away. Arms for symbols, legs for conses: these primitives are the mark a Lisp — even more so than the parenthesis. What do we get when we remove these foundation stones: naming and pairing?
No pairs. No cons. No structure. Unordered. Chaos. Eve, a beautifully incomplete aspect oriented triple store. No need for legs when you can effortlessly transport to your destination. Lazy. Pure. Here and now, a retrospective.
No symbols. No names. No variables. Combinators. Forth. No need for arms when you can effortlessly push and pop your stack. No words. A world without words. Virtual worlds. Virtual reality. Space. Time. Motion. Action. Kinetic Programming, a proposal.
I apologize in advance. Checking my pocketwatch, I see I haven't time to make sense of this white elephant gift of gingerbread elfin nonsense. His way madness lies. Be warned.
"When I returned to Bel I could barely understand it," says Paul Graham himself.
Having come across Paul Graham's On Lisp at a formative age, I made an effort to earnestly examine his Bel language. Let's take a helicopter tour. I observed many specific things (functions that might flow better after a rewrite) but let's stick to hovering above the woods.
PG's objective with Bel is fundamentally modest:
I think the point of a high-level language is to make your programs shorter. ... Lisp dialects have as a rule been good at making programs short. Bel is meant to do the same sorts of things previous dialects have, but more so. It's also meant to be simple and clear. If you want to understand Bel's semantics, you can read the source.
Five sentences. Five claims. How well does today's version of Bel measure up?
- High-level languages are about making programs shorter.
- Just no.
- Lisps make for short programs.
- Sure.
- Bel is an extra Lispy Lisp.
- Yes indeed.
- Bel is simple and clear.
- Not really.
- You can understand Bel's semantics from reading its source.
- Not me — and not PG it turns out.
Looking back decades to the delight of cutting teeth with PG's esthetics, it's funny how taste changes.
Surprise, surprise, I'm going to agree with Jonathan Blow who "would weigh concision as a lower priority than all of these [debuggability, compile-time error checking, readability, busywork elimination, morale (confidence in correctness vs a perpetual semi-confused state)], and probably several others I haven't listed."
From what I've seen, Blow's Jai is an ugly, verbose systems language in which Blow seems super productive. Since the Witness was ugly for years before becoming a masterpiece (do @ me) and since Jai is firmly in progress, Blow may eventually want a beautiful surface syntax — or not. It's up to him. He talks about Jai being fast to compile, close to relevant hardware, simpler than C++ (low but important bar) with better macros (its most Lispy aspect), and having some dynamic scope.
Bel has dynamic scope too. Though only occasionally useful, every program of size I write ends up wanting dynamic variables. I mostly manage by disciplining global variables, but standard support within a language could prove transformative. For instance, the utility method and instance variable plumbing in Rails views/controllers might be managed better with dynamic scope. Really any scenario — a scene graph, an interpreter — where you're passing a context object around benefits. Requires a certain culture: apparently Scala Implicits are Everywhere.
PG responds that he doesn't care for compile-time types because it hinders exploration. Blow is baffled:
Most programming I do is exploratory as well. ... [With types] I don’t have to think about what I am doing very much, and I will bang into the guardrails if I make a mistake. People have expressed to me that functions without declared types are more powerful and more leverageable, but I have never experienced this to be true, and I don’t really understand how it could be true (especially in 2019 when there are lots of static languages with generics).
Some type systems do get in one's way. Generics can be impossible to reason about and good luck with runtime meta-programming. (Compile-time macros can be typed more easily.) But we're talking about the state of the art in 2019! Typed holes provide extra freedom needed for funky good typed live programming. We "just" let a poorly typed program be well-defined so that you can ask the computer, "What did I break typewise when I changed this definition?" Of course formulating a good type system is tricky and doing so in a simple, clear way is trickier. Let's come back to that.
As a proper Lisp, any regularly occurring awkward pattern in Bel can be macroed away. Shave any square peg round! Doing it without getting sawdust everywhere? Good domain specific languages come at a high conceptual cost.
Consider how Bel's 1,800 line definition breaks down:
- ~200 lines of setup
- ~400 lines of evaluator
- ~100 lines of function composition
- ~200 lines to define numbers
- ~1,000 lines mostly for parsing but some utilities live there too
- ~100 lines for arrays and key-value tables
One of these parts is not like the others. Let's come back to it.
Compared with Common Lisp, Scheme, Racket, and Clojure, I find Bel steers clear of some accidental complexity. Bel doesn't have multiple kinds of cons cell or superfluous data types. Bel doesn't have a complex yet underpowered hygienic macro system. Bel doesn't have a hefty grab-bag of concurrency mechanisms. Bel sticks to symbols and pairs, modeling the rest of the world, including closures, in those terms.
For example, the closure
(lit clo ((y . 3)) (x) (+ x y))
adds three to its argument. You would get a similar closure with:
((fn (y) (fn (x) (+ x y))) 3)
I say "similar" because all conses are mutable; therefore, equality and identity regularly differ in Bel. A familiar world yet...
What a world Bel models! Mutable state and imperative execution is taken for granted with no less than five "times" when a piece of code can run:
- Macro expansion time when you first read some code.
- Function definition time when you create a closure.
- Call time when you invoke a closure.
- Continuation time when a stored continuation is returned to.
- Across threads!
Let's throw threads right out because though Bel's interpreter models them, the definition of Bel never uses them, nor should anyone. Forking a process is a different beast as are channels between processes. Bel doesn't go that way though it does have streams.
PG also includes full continuations, but only uses them for exception handling. Lost deep in the stack? Call a continuation to pop right out. But wait, there's more! You can save the continuation for later. After returning and doing more work, go ahead and pop back in time to when you grabbed the continuation. Makes both backtracking and exotic time travel loops easy! However full continuations aren't generally what a person wants: you can jump back in time; but unless you plan ahead, you won't have a way to go back to the future. Delimited continuations, which require you to mark how far back you want to go, and cloneable coroutines, which you explicitly clone when you want to be able to go back, are each safer and similarly powerful. Note that continuations interact with dynamic scope in a way that can differentiate a dynamic variable from a stack shadowing a global variable. The difference corresponds to how you order a continuation monad relative to a state monad.
Having somehow internalized continuation semantics after years exposure and distillation in the back of my brain, I wonder how badly I'd hurt myself if I tried using them for real work. Ruby has them. Let's see... in the 131 gems that make up the dependencies for my current work project, callcc is invoked... 0 times! What about fibers, Ruby's name for coroutines? 26 matches across 9 files. Half of those are my own! Several others are just tests to ensure that stuff doesn't break if someone is in the mood to use fibers. Do these constructs lack intrinsic utility? Is it a matter of programming convention? Ruby has exceptions, a dedicated catch/throw mechanism, and other quick return keywords, so the remaining use-cases are limited. Is there a deep reason to avoid these constructs? Let's come back to that.
Threads and continuations go a long way toward making the interpreter inscrutable — yet also more explicit. Although Lisps reify the source of a program as a data-structure, little self-interpreting Lisps generally don't do the same thing with the run of a program. Instead the interpreter hides context in recursive evaluation: the interpreter's stack includes the stack of the program being interpreted.
To support threads, Bel reifies execution state. (Not the the whole execution but the moment-by-moment. The whole of the execution remains hidden in the execution of the interpreter.) You have an explicit stack of things to do and things to come back to. It gets tangled up in threads and with the data structures implicit in how you cons together lists, I had a hard time telling what's going on. For example:
(def evcall (e a s r m)
(mev (cons (list (car e) a)
(fu (s r m)
(evcall2 (cdr e) a s r m))
s)
r
m))
This function starts to evaluate a call. The arguments amount to the state of the interpreter. It munges them a bit and eventually calls mev
, which sorts out the thread to work on next. evcall
mostly delegates to evcall2
:
(def evcall2 (es a s (op . r) m)
(if ((isa 'mac) op)
(applym op es a s r m)
(mev (append (map [list _ a] es)
(cons (fu (s r m)
(let (args r2) (snap es r)
(applyf op (rev args) a s r2 m)))
s))
r
m)))
It says that if the operation is a macro, then applym
; otherwise the operation is a function and you applyf
. I can't follow what the shapes of these arguments are or the shape of the return values. And I certainly couldn't tweak this without killing myself. Guardrails please!
Come now PG! You make a language with dynamic scope and then you have 17 different functions that thread around the a s r m
context! Wouldn't the dynamically scoped version be more concise — even if you spelled out words instead of using single letter names? I'm all for single letter variables when their scope is local, but the rest of the time one shouldn't have to consult a key.
Tearing up is easier than piecing together. For all my effort over these years, what do I have to show? Hardly a thing! I have learned a little, I hope, but I am nowhere near satisfying answers. Let me share a few tasty bits near Bel at the idea buffet. With no less than 162 links and counting on my backlog, I'm sure I'm missing many good recent developments.
The sweet thing about Lisp is that you build a whole world from consing atomic symbols into pairs and lists and trees and so on. You make symbols refer by having an environment that's just a list of (signifier . signified)
pairs. Keep on this path. You get a sort of case-expression if the signified, instead of being the immediate value, is made into an expression that will, in turn, be evaluated. You get a sort of if-expression if, instead of checking for equality, the signifier is a pattern or expression that you test against. If that pattern then adds to the environment in which you look up the signified expression, you now have a function definition. It all makes for a gentle climb up the the ladder of abstraction. I like, for example, how Barry Jay has explored the ladder.
By the time we reach the top, we have so much pattern power (unification and fallbacks) that we can tackle some of those tricky bits:
- Want types? Walk the expression tree gathering constraints. Check whether they unify. Mmm, pullbacks.
- Want a parser? Monadic parser combinators are delightful.
The climb feels direct, intuitive, but do we introduce accidental complexity? Symbols and conses seem so simple. What do they conceal? What does each notion presuppose?
Consider conses. What do they do? Put things in order. First, rest. Car, cdr. Building blocks. Is ordering tantamount to a sense of time, even mutability? Time: first look here, then look here, finally look there. Mutability? Is there a real difference between
f(x)
x = x + 1
f(x)
and:
(f x)
(let x (+ x 1)
(f x))
Surely order is intrinsic to programming what with source being a sequence of characters, execution being a sequence (roughly) of steps, and humans being creatures of time and space. Yet in the stories, doesn't order stem from chaos?
In recent memory, the Eve people attempted to take set semantics seriously. They hoped that order independence would leave less to worry about. I followed them closely. Pushed their software over a few cliffs to see how far it would fly. Glorious wreckage.
As I plod along in my daily programming life, much of the work revolves around organizing: putting data away, pulling it out again, ferrying information from where it is kept tidy to where it needs to be used. My Eve friends recognized this practice as pervasive and potentially pointless. Let a decent database keep things tidy, so you can focus on doing stuff with the data rather than keeping it stowed.
Eve, at its core, ended up as an entity-attribute-value triplestore. Attach properties directly to the things that have them. Such property lists are almost as Lispy as cons. Eve's query language made it easy to pick things out. Eve's rules completed the loop by making derived properties easy to define. Eve was a lot like Soar actually.
Long ago now, I imagined that Aspect Oriented Programming would prove itself. Being bolted on to Java right when Java was getting weird may not have helped the cause. Eve proved a lovely aspecty experiment as opposed to the one aspect-ish language in wide use: Cascading Style Sheets. Intentionally pathetic, selectors pick out bits of HTML documents to which you attach attributes and values from a fixed set. Awesomeness only comes when the values added are things you can select on. Note that difficulty in using CSS comes from (1) not knowing what the attributes do, especially since browsers have varied, and (2) complicated precedence rules — morale sucking unpredictability. Soar solved this by having a disambiguation step, Eve by searching for fixed point models. Order independence means freedom from when and where worry. Time and space are irrelevant in the Platonic realm.
If Eve was so nifty, why did it fail? Let's talk technical reasons. (I know nothing of the startup ones.) Technical problems with Eve: (1) keying, (2) inspection, (3) reflection, (4) event handling.
(1) Keying proved my most common problem when trying to use Eve. I want an entity per something where the something is complex: like a bid per bidder per product per round. Each bid also has non-identifying properties: a price, the time it was entered, who it was entered by, etc. In regular software, adding an extra "per" is a great time sink. I genuinely hoped Eve would help. Though internally Eve dealt with keys, they never completed the theory nor committed to exposing the details.
(2) Then without great ways to inspect the database, I couldn't see what was going on, how keys came into play.
(3) Since Eve patterns can match against anything, I found it easy to check invariants but hard to identify causes of their violation. One could not reflect on how rules gave rise to derived properties. This was always a goal, just never happened.
(4) Though Eve had a decent theoretical foundation for controlled change over time, it was never exposed in a way that one could easily reason about. The primary challenge being that you want an event to arise in the database, effect a change, then dissipate. Whereas an imperative language lets you write step one, two, three, Eve never had that view. Change was managed through an intricate interplay of rules, always hidden. Dijkstra makes a good point:
Our intellectual powers are rather geared to master static relations and our powers to visualize processes evolving in time are relatively poorly developed. For that reason we should do (as wise programmers aware of our limitations) our utmost to shorten the conceptual gap between the static program and the dynamic process, to make the correspondence between the program (spread out in text space) and the process (spread out in time) as trivial as possible.
Given thought, time, practice, these issues could be have been addressed — but only from grappling with ordering seriously. Why weren't (2) inspection, (3) reflection, and (4) dynamics visualized? To show a thing, it cannot be formless. It must be positioned, arranged in space. With Eve, they kept punting, ignoring the interplay between the ordered and the unordered. Even difficulties in (1) keying amounted to leaking imperative implementation details.
Ironically laziness affords disciple necessary to navigate the channel between order and chaos. I mean lazy execution, pure functional programming. Instead of informally knowing (or guessing) when state matters, when operations can be safely reordered, no punting is allowed. The type priests ensure that the clean is separated from the unclean — else their sufficiently smart compilers could not do the rewrites necessary to get tolerable performance. Remember compiler writing was the primary occupation of pure functional programmers for a time.
We might all be writing Haskell today except the ceremony of it all. The functional core holds together. Type-classes open pandora possibilities. Pull a few language extensions into the mix, and what started as stone written divine law grows to feel incomplete, accidental, corrupted. I await the messiah to fulfill the spirit of the law in two great constructs. Who knows? Maybe they'll be symbol and cons after all! Or not the symbols. Let's consider.
There are two hard things in computer science: cache invalidation, naming things, and off-by-one errors.
Beyond picking a good name, the trick is maintaining its meaning, keeping context clear. Doesn't experience show that shadowed variables are the primary complication in the lambda calculus especially when recursion comes into play? What if we throw out the variables? SKI combinator calculus! Use the Forth! Try concatenative programming. Do. Just pipe one thing through into another.
What if you want to use two things? Set one thing aside, get the other ready, then combine. You may need to set aside a bunch of things: a stack of them. Sure, stack maintenance is clunky; however, the upside is that any sequence readily factors out into a new definition — no local bindings to worry about, only the dictionary of defined words. Concatenative languages get rid of local variables by keeping them on a stack nearby.
Alternatively, instead of pushing a stack of things through one pipe, plumb with multiple pipes. Here we have the essence of classic visual programming data-flow box-wires. Despite some progress at the most mathematic level, Fred Brooks' 1987 critique generally holds up, "Nothing even convincing, much less exciting, has yet emerged from such efforts. I am persuaded that nothing will." Why not?
The screens of today are too small, in pixels, to show both the scope and the resolution of any seriously detailed software diagram. ... The hardware technology will have to advance quite substantially before the scope of our scopes is sufficient for the software-design task. More fundamentally, as I have argued above, software is very difficult to visualize. Whether one diagrams control flow, variable-scope nesting, variable cross references, dataflow, hierarchical data structures, or whatever, one feels only one dimension of the intricately interlocked software elephant. If one superimposes all the diagrams generated by the many relevant views, it is difficult to extract any global overview.
When first dipping my toe into VR for information visualization, I was no so much surprised as lightning struck: there is a lot more room in a room than on a screen, a lot, a lot more. So. Much. Space. But you cannot use text: not a lot, not up close, not comfortably, not with the current technology. What an opportunity! A dyslexic's paradise: an infinitely malleable world in which text is better left banished.
Box-wire diagrams don't get tangled as easily. A little tilt of the head goes a long way toward disambiguating. What about wires in the distance? One solution came quite suddenly on an afternoon. Looking out my window at a jungle mass of foliage, I could not tell where one tree ended and the next began. Then the slightest breeze put the million node network into focus — a breeze and an optimized tree geometry recognition engine. What is a jungle to the primate brain?
Still. Wires are just a means to connect related things. Why not connect them directly? Just put them next to each other. So what if this elephant is three, six, twenty dimensions. For visualization, fold up and tuck away the extra ones. Assume a principle component perspective. People are entirely used to navigating space beyond immediate sensation. Your house, your town, the globe: you can't see it all at once. Yet shrink down the globe, flatten the map, and suddenly the whole is much more manageable. Mark all the players on it. Watch them move. Fast forward. Reverse. Back and forth. Faster, faster, and in a blur time transposes into space charting each character's journey.
How to manifest this magic to make a waved hand spin space, twist time, flatten, fold, and straighten out again? Mucking about in a kitchen of abstract concepts, I certainly have no skill in a physical kitchen and yet to count the movements (methods) invoked in a routine process over five minutes, I total 70 in my compact kitchen. Contrast this with an hour of routine programming. I'll be back... 30 lines of content split across seven files.
Of course, the kitchen of the mind doesn't need to simulate a real kitchen. Always dedicate a button to undo. Hang relevant objects all around. To clean a mess, use pocket dimensions rather than drawers. And telekinesis is a marvel!
Foregoing nested symbol sequences as the substrate of expression, consider motion over time through space. To dance. Whereas writing amounts to words that stay, words abstracted from voice sounds, so record movement not to be edited like film but to be likewise abstracted somewhat from the performance.
Having done a thing, step back from the stage. Objectified, see this diorama. Rewind. Replay. Scrub. Grab that timeline. Stretch it out so as to see moments knotted along the time thread. Thumbnails. Beginning. Middle. End. Pluck out that middle, graft in a replacement. What is this if not a delimited continuation being called?
Kinetic Programming reifies the execution trace. Not just a moment in time, but all of time. A continuation does to execution what a function does to structure: leaves a hole for filling, parameterizes, templates. With ready access to one time dimension and three space dimensions, we simply have more room to do things with structure as opposed to time. We take time to put together what can't be done spontaneously. Composing rather an improvising.
In a virtual reality, perhaps we can shift the balance. Dedicate some of that extra space to time. Origami an event trace ribbon into a call/return tree. To tame Brooks' elephant, we have to learn to see more. That's what I'm playing with these days. What does it mean to map a motion along another motion? What kind of doodad is it? To filter signal from noise? To find the main path among branches? To play a role and then step back to direct yourself. And here I find myself rewound back to the beginning of my programming career: Logo turtles all the way down.
"What would a Kinetic factorial function be like?" a friend asks. Ah, evolution! Measure out a length n. Rotate out a duplicate to make a right angle. Complete the triangle by joining the far ends. At unit intervals along the length, select line segments from the triangle. Now we have n segments of length 1 to n. Take their product, so as to make an n-dimensional prism, each 1 unit longer than the last. Flatten.
Besides being a geometric presentation of the concept, the Kinetic approach suggests a different series of evolutions than the functional. Instead of picking out segments at unit intervals, make the intervals wider or narrower; but really having started at a length n (rather than n discrete objects), instead of selecting segments, it's more natural to consider all the paths from one side of the triangle to the opposite corner, a sort of fibration.
Then beyond discrete sets and Euclidian lengths, what does it mean to have a "triangle" with "shorter" cross sections?