Skip to content

Instantly share code, notes, and snippets.

@lexi-lambda
Created October 28, 2016 23:39
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save lexi-lambda/c5e718c24020b3dc477cacd0c8c4c935 to your computer and use it in GitHub Desktop.
Save lexi-lambda/c5e718c24020b3dc477cacd0c8c4c935 to your computer and use it in GitHub Desktop.

Racket #langs and the REPL

The Racket platform provides a robust set of tools for developing languages, mostly centered around the macro system and the #lang protocol, which allows providing arbitrary readers to convert text input to syntax objects representing Racket modules. The REPL is a bit more complicated, however, because it has different rules—each expression needs to be dynamically read, assigned lexical context, expanded, compiled, evaluated, and printed, all in real time. (In that sense, perhaps the phrase “REPL” oversimplifies what Racket is doing, but that’s a separate conversation.)

So how does Racket accommodate this in the face of completely arbitrary user-defined languages, some of which may not even support interactive evaluation in a traditional sense? Racket mostly solves this problem by having a related but distinct set of protocols for managing runtime interactions that operates alongside #lang.

Modules and the top level

Racket’s evaluation model divides pretty much every piece of execution into two categories: modules and top-level evaluation. The latter is a traditional Lisp/Scheme evaluator, which supports dynamic namespaces, reflection, redefinition, a form of late binding, weakly defined phases, and completely dynamic code loading and evaluation. Due to the relatively weak static guarantees provided by the top level, interaction with macros is complicated and sometimes unpredictable. It is for this reason that Racketeers (in particular Matthew Flatt) consider the top level hopeless.

In contrast, modules provide strong static guarantees. All identifiers are statically, lexically bound, redefinition is illegal, runtime manipulation of the binding namespace is not allowed, dynamic evaluation cannot see lexically bound identifiers (unless modules cooperate using constructs designed for such reflection, such as using namespace anchors), and compilation is separated into clearly delineated phases.

So, how do these two systems interact? Well, from a user’s perspective, they mostly don’t. A user will never see the top level in general programming, except when using the REPL. This is because the REPL is the top level. This is not the only reason Racket is not a particularly REPL-oriented language, but it’s definitely a contributing factor. Aside from the REPL, though, most users won’t directly interact with the top-level, but it’s there all the same—Racket modules are evaluated within the top level by the runtime system.

What does any of this have to do with language-specific REPLs? Well, it’s not critical, but it’s worth understanding at a high level in order to understand how much of a fundamental distinction there is in Racket between the two modes of evaluation.

The #lang protocol

This is not about #lang, so I won’t spend too much time on this, but I want to give a high level overview to explain what #lang actually means in the context of Racket.

As previously mentioned, every Racket file contains a module. (Technically, it is possible to evaluate a set of top-level forms directly, but this is highly discouraged.) A module looks like this:

(module foo racket/base
  (displayln "Hello, world!"))

Each module has three parts: a module name (in the above case, foo), a module language (in the above case, racket/base), and a module body. In most cases, the name is irrelevant, so I will ignore it here. The module language is much more interesting, though, because it specifies the initial set of bindings available in the language. The module language is itself specified as a module path, which refers to another module. The module that bootstraps this whole process is the magic '#%kernel module, which contains all of the primitives the Racket platform provides.

The above code snippet is a valid Racket file, but most Racket source files do not appear in that form. Instead, they contain a line beginning with #lang. The above example would be written more idiomatically as follows:

#lang racket/base
(displayln "Hello, world!")

In this case, the Racket reader would transform the above code into the original example. However, it is not completely equivalent. Rather than going directly from #lang racket/base to (module foo racket/base ...), the reader consults a module called (submod racket/base reader) (a submodule defined inside the racket/base module), which must provide a procedure called read-syntax. The reader will invoke read-syntax on the remaining source from the module body, and read-syntax should return a (module ...) form as a result.

In the case of #lang racket/base, read-syntax is quite uninteresting, because it just defers to the underlying Racket reader. However, other languages with non-s-expression syntax will provide very different read-syntax implementations, which will parse the source code into s-expression based syntax objects wrapped in a (module ...) form. As an example, a module written in my Tulip implementation looks like this:

#lang tulip
@import tulip/math
f x = mul x 2

…which is translated by (submod tulip reader)’s read-syntax procedure into something like this:

(module anonymous-module tulip
  (#%require tulip/math)
  (@%define f (@%lambda [(x) ((mul x) 2)])))

The tulip module provides bindings for the names #%require, @%define, and @%lambda, which are macros defined in terms of forms. After the reader produces the above module, the expander runs, converting the source into nothing but primitive forms and references to runtime bindings, which looks something like this:

(module anonymous-module tulip
  (#%require tulip/math)
  (define-values (f)
    (lambda (x)
      (#%app (#%app mul x) (quote 2)))))

(The actual expansion is somewhat more complicated, but not terribly so.)

After expansion, the source without macros is shipped off to the compiler, which produces bytecode, and that bytecode is eventually provided to the runtime to be evaluated. This is pretty much the extent of #lang—it’s simple, predictable, and straightforward.

The Racket REPL

So, what about the REPL? Well, the REPL is entirely different. Does it have to be? Maybe not, but the way Racket implements it is (in part for historical reasons).

The REPL is kicked off by invoking a procedure called, unsurprisingly, read-eval-print-loop, which defers to three parameters: current-prompt-read, current-eval, and current-print. Racket “parameters” are effectively dynamically bound values, so these things can be customized at runtime to control how the REPL works. The REPL calls (current-prompt-read) to produce a syntax object, wraps the resulting syntax object with (#%top-interaction ...), and evaluates it using (current-eval). The result of the evaluation is provided to (current-print), which can display the value to the user as it sees fit.

Most users will never need to modify current-eval at all, since it handles evaluation of syntax objects and compiled code. In contrast, (current-prompt-read) and (current-print) are quite useful to modify, since they serve as hooks for language implementors to provide their own interactions.

The default implementation of current-prompt-read prints > , then defers to another parameter, current-read-interaction. Since current-prompt-read is so flexible, a user could completely override it to do whatever they’d like, including adding line editing support, tab completion, launching a GUI, or anything else entirely. However, for most traditional REPLs, overriding current-read-interaction is enough, which normally just defers to read-syntax.

For some cases, replacing current-read-interaction with a procedure that refers to your language’s read-syntax implementation is sufficient. For other languages, you might want to use a completely different implementation, such as for a data language that supports a querying but not dynamic modification of the data at runtime.

So then, as the author of a #lang, how do you actually configure these parameters at runtime? Well, there are a few different ways, but none of them are particularly elegant or brilliant. Probably the most straightforward one, though, is to make your #lang produce a module containing a submodule with the name configure-runtime, which the Racket runtime will automatically require when starting a REPL for a particular module. Within the configure-runtime submodule, you can mutate the values of the aforementioned parameters to control evaluation as you see fit. Alternatively, you can attach a syntax property to the module’s syntax object with the key 'module-language, which specifies a procedure that will be invoked when the REPL starts.

For an example of configuring the runtime for a non-s-expression language, take a look at racket-tulip’s implementation, which takes some additional steps to cooperate with DrRacket’s more powerful REPL functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment