Skip to content

Instantly share code, notes, and snippets.

@CodaFi
Created July 8, 2018 06:33
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save CodaFi/07cff3df42318e2de28d5dd9ca387d02 to your computer and use it in GitHub Desktop.
Save CodaFi/07cff3df42318e2de28d5dd9ca387d02 to your computer and use it in GitHub Desktop.
Macro-by-Example: Deriving Syntactic Transformations from their Specifications, Re-Typeset
//: Foreword
//:
//: This playground is the culmination of a week of digging around
//: for the oldest papers I could find about (Lisp) macros and their evolution.
//: I found this paper particularly enlightening as it is the place where Scheme
//: and Rust-style "Macro-by-Example" macros first came into their own. In
//: addition, this paper discusses the implementation challenges faced by
//: previous macro systems and Kohlbecker's first go at MBEs.
//:
//: I have re-typeset this paper because [the one hosted by IU](ftp://www.cs.indiana.edu/pub/techreports/TR206.pdf)
//: is annoying to read. I also translated some of the more interesting
//: definitions to Swift to see what would happen.
//: # Introduction
//: Modern programming languages offer powerful facilities for procedural and
//: data abstraction which encapsulate commonly-used patterns of procedure
//: invocation and of data usage, respectively. Often, however, one encounters
//: typical patterns of linguistic usage that do not fall neatly into either
//: category. An example is the let expression, which encapsulates an often-used
//: pattern of function creation and call:
//:
//: (let ((i e) ...) b ...)
//: ==> ((lambda (i ...) b ...) e ...)
//:
//: This is not a procedural abstraction; rather, it is a syntactic abstraction:
//: an avstraction of a typical pattern of syntax.
//:
//: Most modern languages do not provide tools for the creation of syntactic
//: abstractions comparable to those available for procedural or data
//: abstractions. Even in languages such as Lisp that allow syntactic
//: abstractions, the process of defining them is notoriously difficult and
//: error-prone. To define let as a macro, we must write a procedure that
//: transforms any expression which matches the left-hand pattern into a
//: corresponding instance of the right-hand pattern. The code for this looks
//: like:
//:
//: (lambda (s)
//: (cons
//: (cons ’lambda
//: (cons (map car (cadr s))
//: (cddr s)))
//: (map cadr (cadr s))))
//:
//: This code can hardly be considered transparent. It bears no obvious relation
//: to the transformation it engenders. It is difficult to see the shape of the
//: structure it is building or the shape of the structure it is trying to take
//: apart. Furthermore, it does no error checking on its input.
//:
//: Modern Lisps supply some tools to help the macro-writer, most notably
//: backquote and defmacro (e.g. [Foderaro, Sklower, & Layer 83](http://www.softwarepreservation.org/projects/LISP/franz/Franz_Lisp_July_1983.pdf)).
//: Backquote makes the code for building the output look more like the output
//: itself, and defmacro includes a pattern-matching facility. Using these
//: tools, the let macro might be defined as:
//:
//: (defmacro let (decls . body)
//: ‘((lambda ,(map car decls) . ,body)
//: . ,(map cadr decls)))
//:
//: This is considerably better, but the mapping functions are still mysterious.
//: Furthermore, the backquote mechanism is itself error-prone: one always
//: leaves out at least one comma on the first try!
//:
//: In our facility, one defines let as follows:
//:
//: (declare-syntax let
//: [(let ((i e) ...) b ...)
//: ((lambda (i ...) b ...) e ...)])
//:
//: This is close to the language used for specifying syntactic extensions in
//: the revised Scheme report [Steele and Sussman 78]() and the 1986 Scheme
//: report [Rees, Clinger, et al. 86], except that it is executable. The
//: specification language has the following features:
//:
//: 1. Pattern-matching, including error-checking, is done on the input.
//: 2. The output specification matches the form of the output. No commas or
//: other symbols are needed.
//: 3. Repetitive elements are specified naturally on both input and output.
//:
//: This specification mechanism has been used in various versions of Scheme
//: since 1982, most recently in Chez Scheme [Dybvig 87](https://www.scheme.com/tspl4/),
//: and has proved to be a robust and highly useful feature. Only recently,
//: however, did the need for formal documentation of the mechanism lead us to
//: develop the formal semantics and the semantically derived compiler presented
//: here.
//:
//: We call this mechanism macro-by-example, or MBE. It has allowed us to embed
//: a number of interesting languages in Scheme, such as:
//:
//: - a type-checked Scheme variant called SPS [Wand 84](https://dl.acm.org/citation.cfm?id=502895)
//: - a coroutine mechanism [Friedman, Haynes, & Wand 86](https://www.sciencedirect.com/science/article/pii/009605518690007X)
//: - an import/export mechanism for modules [Felleisen and Friedman 86](https://www.sciencedirect.com/science/article/pii/0096055186900159)
//: - two quite different semantically-derived subsets of Prolog [Felleisen 85](ftp://ftp.extreme.indiana.edu/pub/techreports/TR182.pdf), [Wand 85]
//:
//: The presence of this convenient syntactic extension tool provides an
//: important design dimension: given any proposed language extension, we get to
//: decide how much of it should be treated as procedural abstraction, how much
//: as data abstraction, and how much as syntactic abstraction. The presence of
//: a good syntactic abstraction mechanism means that a language extension which
//: might be highly cumbersome expressed in a purely procedural way can be made
//: far more usable by a propitious choice of syntax. (Indeed, the
//: `declare-syntax` facility itself may be viewed as such a propitious choice
//: of syntax). Conversely, given a complex syntactic abstraction, we can try to
//: convert some of it to a procedural or a data abstraction.
//: # The Specification Language
//: Consider the following example of MBE:
//:
//: (declare-syntax and
//: [(and) true]
//: [(and e) e]
//: [(and e1 e2 ...) (if e1 (and e2 ...) false)])
//:
//: It illustrates some of the important features of MBE:
//:
//: 1) An MBE specification consists of a series of input-output patterns, which
//: are tried in order. If a call matches none of the input patterns, then an
//: error is signalled.
//: 2) Backquotes and commas in the output patterns are avoided by a simple
//: convention: identifiers which appear in the input pattern are treated as
//: pattern variables; all other identifiers are constants. No special
//: treatment is necessary for lambda or quote in the transcription proper,
//: but the result of the transcription can be processed using the
//: α-converting expander of [Kohlbecker, et al. 86](ftp://www.cs.indiana.edu/pub/techreports/TR194.pdf)
//: to avoid capture of bound variables in macros such as:
//:
//:
//: ```
//: (declare-syntax or2
//: [(or2 x y) (let ((v x)) (if v v y))])
//: ```
//:
//: which would otherwise capture any occurrences of v in its second argument.
//: (The production version of MBE also allows the user to specify the capture
//: of particular variables).
//:
//: 3) Whenever the ellipsis (indicated by the atom "...") is used, it must be
//: preceded by a pattern, and it must be the last element of the pattern
//: list or sub-list. (Again, the production version is somewhat more
//: generous; we do not illustrate this here). Ellipses may be nested, so
//: pattern variables may denote S-expressions, lists of S-expressions, lists
//: of lists of S-expressions, etc.
//:
//: As an illustration of MBE as a language-embedding tool, consider the
//: following macros, which are taken from [Wand 85]. These macros define two
//: new special forms whose translation produces (through details that are of
//: no concern here) an interface to an underlying Prolog semantics.
//:
//: (declare-syntax clauses
//: ((clauses (var1 ...) cl1 ...)
//: (lambda (z)
//: (let ((var1 (genvar)) ...)
//: (call/kf
//: (lambda (kf)
//: (ff-set-ref! **cut-point** kf)
//: (choose (clause cl1) ...)))))))
//:
//: (declare-syntax clause
//: ((clause (pat act1 ...))
//: (begin
//: (unify (pattern pat) z)
//: (action act1) ...)))
//:
//: (declare-syntax action
//: ((action (fn e1 ...))
//: (fn (list (pattern e1) ...))))
//:
//: Without the ability to adapt the syntax, the interface would have been
//: impossible to use (indeed, an earlier version of our Prolog semantics
//: foundered on precisely this point). With these macros in place, we can write
//: Prolog predicates as Scheme procedures in a moderately convenient syntax,
//: for example:
//:
//: (define Append
//: (clauses (y a d u v)
//: ((nil y y))
//: (((a . d) u (a . v))
//: (Append d u v))))
//: The production version [Kohlbecker 86](https://www2.ccs.neu.edu/racket/pubs/dissertation-kohlbecker.pdf)
//: includes a number of additional bells and whistles. First, arbitrary tests
//: may be specified by an optional argument called a fender. For example, the
//: macro for let should include a test to make sure the bound variables are
//: atoms:
//:
//: (declare-syntax let
//: [(let ((i e) ...) b ...)
//: (mapand atom? ’(i ...))
//: ((lambda (i ...) b ...) e ...)])
//:
//: Second, arbitrary processing of the macro call may be performed by a
//: with-specification, which binds its variable to the result of a Scheme
//: calculation.
//:
//: (declare-syntax new-lang
//: [(new-lang e ...)
//: (with ([code (translate-to-scheme ’(e ...))])
//: (top-level code))])
//: And third, the macro-writer may also specify keywords to serve as pattern
//: constants in the input patterns (e.g., else) and as terminators for
//: ellipses.
//: # Semantics
//: In this section, we sketch the formal semantics of the MBE mechanism. We
//: restrict ourselves to a single input-output pair, emphasizing the correct
//: treatment of ellipses.
//:
//: In our description, we employ relatively standard notation. We use three
//: closely-spaced dots `(...)` for the ellipsis symbol and center dots
//: `(· · ·)` for ellipsis in the meta-language we use `(x)1, (x)2` for
//: projection, `( α ⇒ β , γ)` for conditionals, and `A −o-> B` for the set of
//: partial functions from `A` to `B`.
//: We define the following domains:
//: `S-exp ::= Ident | (S-exp ... S-exp)`
enum SExpr {
case atom(String)
case many([SExpr])
/// Returns true if this expression is the empty list `()`.
var isNull: Bool {
switch self {
case let .many(xs) where xs.isEmpty:
return true
default:
return false
}
}
/// Returns the left and right elements of this expression if it is a pair -
/// not an atom and not empty. Returns `nil` otherwise.
var asPair: (SExpr, SExpr)? {
switch self {
case let .many(xs) where !xs.isEmpty:
return (xs[0], .many(Array(xs.dropFirst())))
default:
return nil
}
}
/// Returns the elements of this expression if it is a list. Returns `nil`
/// otherwise.
var asList: [SExpr]? {
switch self {
case let .many(xs):
return xs
default:
return nil
}
}
static func cons(_ l: SExpr, _ r: SExpr) -> SExpr {
guard let xs = r.asList else {
fatalError()
}
return .many([l] + xs)
}
var hd: SExpr? {
switch self {
case let .many(xs):
return xs.first.map { .many([$0]) }
default:
return nil
}
}
var tl: SExpr? {
switch self {
case let .many(xs):
return .many(Array(xs.dropFirst()))
default:
return nil
}
}
var classify: SType {
switch self {
case .atom:
return .ground(self)
case let .many(xs):
return .many(xs.map{$0.classify})
}
}
}
//: `S# ::= S-exp | (S# ... S#)`
enum SType {
case ground(SExpr)
case many([SType])
var length: Int {
switch self {
case .ground(_): return 0
case let .many(xs): return xs.count
}
}
var isNull: Bool {
switch self {
case .ground(_): return true
default: return false
}
}
var hd: SType? {
switch self {
case .ground(_): return nil
case .many(let tys): return tys.first.map { .many([$0]) }
}
}
var tl: SType? {
switch self {
case .ground(_): return nil
case .many(let tys): return .many(Array(tys.dropFirst()))
}
}
}
//: `Pat ::= () | Ident | (Pat . Pat) | (Pat ...)`
enum Pattern {
case empty
case identifier(String)
indirect case pair(Pattern, Pattern)
indirect case ellipses(Pattern)
/// Returns a list of identifiers contained in a pattern.
var freeVariables: Set<String> {
switch self {
case .empty: return []
case let .identifier(a): return [a]
case let .pair(p1, p2): return p1.freeVariables.union(p2.freeVariables)
case let .ellipses(ps):
return ps.freeVariables
}
}
}
//: `Env = Ident -0-> (Int x S#)`
typealias Environment = [String: (Int, SType)]
///: Takes an input pattern and an S-expression and returns a boolean indicating
///: whether the S-expression matches the pattern.
///:
///: The equations for `D` are guaranteed to be sensible only on the condition
///: that `B` returns true.
func B(_ p: Pattern, _ s: SExpr) -> Bool {
switch p {
case .empty:
return s.isNull
case .identifier(_):
return true
case let .pair(p1, p2):
guard let (hd, tl) = s.asPair else {
return false
}
return B(p1, hd) && B(p2, tl)
case let .ellipses(ps):
guard let ss = s.asList else {
return false
}
return ss.allSatisfy({ t in B(ps, t) })
}
}
//: Takes an input pattern and an S-expression and (if it matches the pattern)
//: returns an environment associating pattern variables with their result
//: bindings.
//:
//: Each binding is a pair consisting of a non-negative integer and an element
//: of S#. The integer indicates the variable's level, the number of ellipses
//: enclosing it.
func D(_ p: Pattern, _ s: SExpr) -> Environment? {
func D(at level: Int, _ p: Pattern, _ s: SExpr) -> Environment? {
switch p {
case .empty:
return [:]
case let .identifier(a):
return [a:(level, .ground(s))]
case let .pair(p1, p2):
guard let (hd, tl) = s.asPair else {
return nil
}
guard let hdEnv = D(at: level, p1, hd) else {
return nil
}
guard let tlEnv = D(at: level, p2, tl) else {
return nil
}
return hdEnv.merging(tlEnv, uniquingKeysWith: {$1})
case let .ellipses(ps):
guard let ss = s.asList else {
return nil
}
var dict = Environment()
for s in ss {
guard let env = D(at: level + 1, ps, s) else {
return nil
}
dict.merge(env, uniquingKeysWith: {$1})
}
return dict
}
}
return D(at: 1, p, s)
}
//: Takes an output pattern and an environment and expands the pattern in that
//: environment.
func T(_ p: Pattern, _ env: Environment) -> SExpr? {
switch p {
case .empty:
return .many([])
case let .identifier(a):
// If the output pattern is an identifier, we first check to see if it is
// bound in the environment.
guard let (lev, tm) = env[a] else {
// If it is not bound in the environment, then it is treated as a
// constant.
return .atom(a)
}
// If it is, and its level is zero, then the result is its value.
guard lev == 0 else {
return .atom(a)
}
// If the level is non-zero, then it was bound to a list of values, not a
// value, so it may not be transcribed.
return nil
case let .pair(p1, p2):
// If the output pattern is a pair (p1 . p2), its head and tail are each
// processed separately. The two results are then joined.
guard let x = T(p1, env), let xs = T(p2, env) else {
return nil
}
return SExpr.cons(x, xs)
case let .ellipses(ps):
// In processing an output pattern (p ...) with T, we
// have the inverse of the problem with D: we must split a
// single environment into a list of environments. To do
// this, we first restrict the environment to the free
// variables of p. We then check whether the variables of
// p include at least one variable of level greater than 0. If
// not, the prototype p is rejected, since there is no way
// to determine the length of the repetition. If the
// prototype passes, then we decompose the environment into a
// list of environments and map T[p] over this list. The
// decomposition is complicated by three considerations:
//
// 1) We must decrement the level of each of the non-scalar
// variables in the environment.
// 2) We copy scalar (level 0) variables across the list, being
// sure *not* to decrement their level, so that code like
//
// (declare-syntax copy-it
// [(copy-it i j ...) (bar '(i j) ...)])
//
// correctly transcribes
//
// (copy-it a 1 2 3)
//
// into
//
// (bar '(a 1) '(a 2) '(a 3))
//
// 3) We determine the length of the list. This is the
// length of the lists in the environment if they are all
// the same length. If they are not the same length, an
// error is signalled.
func controllable(_ env: Environment, _ rhoRestrict: Environment) -> Bool {
return rhoRestrict.keys.firstIndex(where: { (k) -> Bool in
guard let val = rhoRestrict[k] else { return false }
return val.0 > 0
}) != nil
}
let fv = p.freeVariables
let rhoRestrict = env.filter { (t) -> Bool in
return fv.contains(t.key)
}
guard controllable(env, rhoRestrict) else {
return nil
}
// Decompose the environment into a list of environments
func decompose(_ rho: Environment) -> [Environment] {
guard let firstLen = rho.first?.value.1.length else { return [] }
// UnequalLengths?
guard rho.allSatisfy({ (t) -> Bool in t.value.1.length == firstLen }) else {
fatalError("Type error!")
}
// StopNow?
guard firstLen > 0 else {
return []
}
let splitHeads = rho.mapValues({ (t) -> (Int, SType) in
guard t.0 == 0 else {
return (t.0 - 1, t.1.hd!)
}
return (t.0, t.1)
})
let splitTails = rho.mapValues({ (t) -> (Int, SType) in
guard t.0 == 0 else {
return (t.0 - 1, t.1.tl!)
}
return (t.0, t.1)
})
return [splitHeads] + decompose(splitTails)
}
let ds = decompose(rhoRestrict)
var exprs = [SExpr]()
for d in ds {
guard let t = T(ps, d) else {
return nil
}
exprs.append(t)
}
return .many(exprs)
}
}
//: A transform function for one input-output pattern pair.
func E(_ ps: (lhs: Pattern, rhs: Pattern), _ s: SExpr) -> SExpr? {
guard B(ps.lhs, s) else {
return nil
}
guard let env = D(ps.lhs, s) else {
return nil
}
return T(ps.rhs, env)
}
//: # Deriving The Compiler
//: The original versions of MBE (along with the current production version,
//: detailed in [Kohlbecker 86]((https://www2.ccs.neu.edu/racket/pubs/dissertation-kohlbecker.pdf))), used ad hoc code generation methods to
//: compile a specification into Scheme code. The code generator is quite
//: clever; for example, for let it produces the code shown in the first
//: display. The evolution of this code generator has been tortuous, and
//: therefore MBE seemed to be a good real-world example on which to exercise
//: semantic methods.
//:
//: The first step in the derivation was the production of the formal semantics
//: sketched above. From there, the derivation proceeded in two phases: staging
//: [Jorring and Sherlis 86] and representation [Wand 82].
//:
//: The goals of staging are to identify opportunities for early binding and to
//: reorder and re-curry the arguments to the various functions so that the
//: early-bindable arguments appear first.
//:
//: Our particular goal is to avoid building intermediate structures such as
//: environments whenever possible. Recall that `D` has functionality
//:
//: // Pat -> S-exp -o-> (Ident -o-> (Int × S#))
//: func D(_ p: Pattern, _ s: SExpr) -> Environment?
//:
//: We delay the binding of S-exp by replacing Env by
//:
//: Env′ = (Ident -o-> Int) x (Ident -o-> S-exp -> S#)
//:
//: *Editor's note: Env′ as defined here does not typecheck as used in the rest
//: of the paper. The authors probably meant the following type instead:*
//:
//: Env′ = (Ident -o-> Int) x (Ident -o-> S-exp -> S-exp)
typealias Env2 = ([String: Int], [String: (SExpr) -> SExpr])
//: and replacing `D` by a function `D′` of functionality `Pat -> Env′`.
//:
//: An element of `Env′` may be thought of as a symbol table. Instead of having
//: the function `D` which takes an S-expression and builds an environment
//: containing S-expressions, we have a new function which takes a pattern and
//: builds a symbol table containing functions from S-expressions to
//: S-expressions. We call these functions selectors. A selector is later
//: applied to the subject S-expression to extract the values for the variables.
//: Thus the condition we want is that if `B[p]s` is true, then for all `i`,
//:
//: ⟨(D′[p])1i, (D′[p])2is⟩ = D[p]si
//:
//: This formulation also makes clear that the levels of the variables are
//: independent of the subject S-expression to which the pattern is matched.
//: (This is analogous to static chain analysis or type analysis).
//:
//: It is easy to build `D′` as the product of two functions, which we call
//: `D1′` and `D2′` . Each of these is obtained by modifying the appropriate
//: piece of the definition of `D`, using the congruence condition above as a
//: guide. Doing this, we get:
//:
func D1(_ p: Pattern) -> [String: Int] {
func D1(at level: Int, _ p: Pattern) -> [String: Int] {
switch p {
case .empty:
return [:]
case let .identifier(a):
return [a:level]
case let .pair(p1, p2):
let hdEnv = D1(at: level, p1)
let tlEnv = D1(at: level, p2)
return hdEnv.merging(tlEnv, uniquingKeysWith: {$1})
case let .ellipses(ps):
return D1(at: level + 1, ps)
}
}
return D1(at: 1, p)
}
func D2(_ p: Pattern) -> [String: (SExpr) -> SExpr] {
switch p {
case .empty:
return [:]
case let .identifier(a):
return [a:{ $0 }]
case let .pair(p1, p2):
let hdEnv = D2(p1)
let tlEnv = D2(p2)
let head = hdEnv.mapValues { (f) -> (SExpr) -> SExpr in
return { s in f(s.hd!) }
}
let tail = tlEnv.mapValues { (f) -> (SExpr) -> SExpr in
return { s in f(s.tl!) }
}
return head.merging(tail, uniquingKeysWith: {$1})
case let .ellipses(p):
return D2(p).mapValues { (f) -> (SExpr) -> SExpr in
return { s in f(s.hd!) }
}
}
}
//: Now we can proceed to modify `T` to work with new-style environments. Just
//: as the analysis of `D` produced selectors, the analysis of `T` produces
//: constructors. The new version of `T` will be
//:
//: T′ : Pat -> Env′ -> S-exp -o-> S-exp
//:
//: Using the condition on `D′` above to illustrate how to translate new
//: environments into old environments, we see that the condition we want
//: `T′` to obey is
//:
//: func T2(_ p: Pattern, _ e: Env2, _ s: SExpr) -> SExpr? {
//: var loweredEnv = Environment()
//: for k in e.0.keys {
//: loweredEnv[k] = (e.0[k]!, e.1[k]!(s))
//: }
//: return T(p, loweredEnv)
//: }
//:
//: With this condition, we can redefine E as
func E2(_ ps: (lhs: Pattern, rhs: Pattern), _ s: SExpr) -> SExpr? {
guard B(ps.lhs, s) else {
return nil
}
return T2(ps.rhs, (D1(ps.lhs), D2(ps.lhs)), s)
}
//: A simple calculation shows that this definition is equivalent to the old
//: one.
//:
//: As usual, it is easy to write the first three clauses for T ′:
func T2(_ p: Pattern, _ e: Env2, _ s: SExpr) -> SExpr? {
switch p {
case .empty:
return .many([])
case let .identifier(a):
guard let lev = e.0[a] else {
return .atom(a)
}
guard lev == 0 else {
fatalError()
}
guard let f = e.1[a] else {
return nil
}
return f(s)
case let .pair(p1, p2):
guard let x = T2(p1, e, s), let xs = T2(p2, e, s) else {
return nil
}
return SExpr.cons(x, xs)
case .ellipses(_):
fatalError("Deferred to next section")
}
}
//: The analysis of `T′[(p ...)]` is harder; we defer it to the next section.
//:
//: The staging analysis shows that at compile-time we can produce a set of
//: tests, selectors, and constructors that can be applied at macro-expansion
//: time to an S-expression. In the second phase of the derivation, we choose
//: appropriate representations for these functions. In keeping with slogan
//: "target code as a representation of semantics" [Wand 82], we represent these
//: functions in Scheme, our target language.
//:
//: We begin with the representation for selectors. To determine the
//: representation, we look at `D2′` to see what are the basic selectors and
//: what are the constructors which build selectors. This gives us a
//: mini-language of selectors. We then choose representations for this
//: language in Scheme. In particular, we represent these functions as Scheme
//: expressions with a single free variable `s`. Then the function can be
//: applied simply by evaluating the expression in a suitable environment.
//:
//: Looking at the definition of `D2′` , we can see that every selector is
//: either `λ s . s` or of one of the forms `λ s . (M (hd s))` ,
//: `λ s . (M (tl s))` and `λ s . (map M s)`, where `M` is another selector.
//: Thus we can represent these by the Scheme expressions `s`, `M [(car s)/s]`,
//: `M[(cdr s)/s]`, and `(map (lambda (s) M) s)`, respectively, where `M[N/v]`
//: means the substitution of `N` for the free occurrences of `v` in `M`. We can
//: then do a small amount of peephole optimization on the representations
//: (e.g., replacing `(lambda (s) (map (lambda (s) s) s))` by `s`). This is easy
//: since the language of selectors is small. We can analyze tests and
//: constructors similarly, by looking at the definitions of B and T ′.
//:
//: Since the same representation is used for all these functions, they can be
//: inter-mixed when necessary. For example, in `T′` there is no separate
//: representation for `(ρ′)2a`, since this is a selector whose representation
//: is already determined.
//:
//: Having decided on a representation, we then go back and modify the functions
//: `B`, `D2′` and `T′` to produce these concrete representations rather than
//: the functions. This gives us a compiler: a function which takes a
//: pattern-transcription pair and produces a piece of Scheme code which
//: performs the transliteration.
//:
//: More precisely, we replace E by:
//:
//: *Editor's note: In Scheme, there is a difference here. In Swift, there
//: difference is unimportant enough that there's no sense redefining
//: everything because we can't actually 'compile' anything*
func E3(_ ps: (lhs: Pattern, rhs: Pattern), _ s: SExpr) -> SExpr? {
return E2(ps, s)
}
//: Note that all this code is obtained by modifying the output of the previous
//: functions, not by doing any *ab initio* coding or analysis. In Scheme, this
//: modification may be accomplished by simply introducing backquotes in the
//: appropriate places.
//:
//: This completes the compiler, except for the transcription of ellipses.
//: *Editor's note: From here on out, nothing else is transliterated. The
//: proper way to do this requires adding expression quoting to Swift and
//: that's too much work for a Saturday evening.*
//: # Transcribing Ellipses
//: Transcribing ellipses is a harder problem for this compilation strategy,
//: since in general there is no way to avoid building intermediate structures
//: akin to environments.
//:
//: Our strategy is to deal first with the most common special case, which
//: luckily does not require any intermediate structures. It is easy to confirm
//: that the following clause satisfies the congruence relation (∗), given in
//: the previous section, which must hold between `T` and `T′`:
//:
//: T′[(a ...)]ρ′ = ((ρ′)1a = 1) ⇒ (ρ′)2a, error
//:
//: This takes care of what seems to be the most common use of ellipses. In
//: general, however, we need to attack the congruence relation for
//: `T′[(p ...)]` directly. The congruence relation requires that
//:
//: T ′[(p ...)]ρ′s
//: = T [(p ...)](λi.⟨(ρ′)1i, (ρ′)2is⟩)
//: = map (T [p])
//: (decompose (λi ∈ fv(p).⟨(ρ′)1i, (ρ′)2is⟩))
//:
//: Let us write `ρ` for the argument to decompose above. We need to build some
//: representation of `ρ` and of decompose `ρ`. That representation should allow
//: us to build as much as possible of the environment before knowing the value
//: of `s`. Once that is done, we can attack the problem of replacing the `T` in
//: the mapping expression by `T′`.
//:
//: Let `i∗ = [i0,···,ik]` be an enumeration of the free variables of `p` in a
//: list. (We use the square brackets `[···]` or `[··· l ··· | l ∈ L]` to denote
//: lists). We represent all functions whose domain is `fv(p)` as lists, indexed
//: by `i∗`. Thus a function `f` can be represented by the list
//: `[f(i0),···,f(ik)]`.
//:
//: Since an element of `Env′` is a pair of functions, we represent it as a pair
//: of lists. We refer to these two lists as the level component and the value
//: component of the environment. Hence `decompose ρ` becomes
//:
//: decompose [(ρ′)1i | i ∈ i∗] [(ρ′)2is | i ∈ i∗]
//:
//: The output from `decompose` is a list of environments, each of which has the
//: same level component. Hence we can represent this list of environments as a
//: single level component and a list of value components. Thus we can represent
//: `decompose (λ i ∈ fv(p) . ⟨(ρ′)1i, (ρ′)2is⟩)` as
//:
//: ⟨decompose-levels n∗, decompose-values n∗ s∗⟩
//: n∗ =[(ρ′)1i|i∈i∗]
//: s∗ = [(ρ′)2is | i ∈ i∗]
//: decompose-levels = λn∗.[((n = 0) ⇒ 0, n − 1) | n ∈ n∗]
//: decompose-values
//: = λ n∗ s∗ . UnequalLengths? s∗ ⇒ error,
//: StopNow?ρ ⇒ (),
//: cons (split∗ hd n∗ s∗)
//: decompose-valuesn∗ (split∗ tl n∗ s∗)
//: split∗ = λfn∗s∗.[(n = 0) ⇒ s, fs | (n, s) ∈ (n∗, s∗)]
//:
//: Here the definition of split∗ describes mapping over two lists of equal
//: length.
//:
//: By these manipulations, we have given ourselves the possibility of
//: precomputing the levels. We take advantage of this by currying the
//: functionality of `T′`, changing it from
//:
//: T′ : Pat -> ((Ident −o-> Int) × (Ident −o-> S-exp -> S♯)) -> S-exp −o-> S-exp
//:
//: to
//:
//: T′ : Pat -> (Ident -o-> Int) -> (Ident -o-> S-exp -> S#) -> S-exp −o-> S-exp
//:
//: We may also make a similar change in the functionality of `T` to reflect our
//: new representation of environments:
//:
//: T : Pat -> (Ident -o-> Int) -> (Ident -o-> S#) -> S-exp
//:
//: With these changes, the congruence condition between T and T ′ may be
//: restated as:
//:
//: T′[p]ρ1ρ2s = T[p]ρ1(λi . ρ2is)
//:
//: Let ρ∗ be the selector environment `D2′[i∗]`, that is, the environment
//: `{(i0 |-> λ s . (hd s)), (i1 |-> λs.(hd (tl s))), · · ·}`. Hence, if we have
//: an environment represented by the pair `(n∗ , s∗)`, we can use `ρ∗` with
//: `T′` to retrieve elements from `s∗` . More precisely, we may deduce from the
//: congruence condition that for any pattern `p′`, we have
//:
//: T′[p′]n∗ρ∗s∗ = T[p′]n∗s∗
//:
//: where the first s∗ denotes a list which is the subject of the transcription
//: and the second denotes the same list in its role as the representation of a
//: value environment.
//:
//: We can use this identity to replace the `T` in the definition of
//: `T′[(p ...)]` by `T′`. For clarity, we assume for the moment that the
//: pattern `p` is controllable in level-environment `ρ1`. We may now derive the
//: new definition:
//:
//: T ′[(p ...)]ρ1ρ2s
//: = T [(p ...)]ρ1(λi.ρ2is)
//: = map (T [p])(decompose (λ i ∈ fv(p) . ⟨ρ1i, ρ2is⟩))
//: = let i∗ =fv(p); n∗ =[ρ1 i | i ∈ i∗];
//: m∗ = decompose-levels n∗ in
//: λ s . map (T [p]m∗)(decompose-values n∗ [ρ2is | i ∈ i∗])
//: = let i∗ = fv(p); n∗ = [ρ1i | i ∈ i∗];
//: m∗ = decompose-levels n∗; ρ∗ = D2′ [i∗] in
//: λ s . map (T ′[p]m∗ρ∗)
//: (decompose-values n∗ [ρ2is | i ∈ i∗])
//:
//: This version of the definition allows us to write `T′` as a pure structural
//: recursion, so that we can transcribe it into a concrete representation as
//: well:
//:
//: T′′[(p ...)]ρ1ρ2
//: = let i∗ = fv(p); n∗ = [ρ1i|i∈i∗];
//: m∗ = decompose-levels n∗; ρ∗ = D2′ [i∗] in
//: (map (lambda (s) T ′[p]m∗ρ∗)
//: (decompose-values (quote n∗)
//: (list ρ2i0 · · · ρ2ik)))
//:
//: For the last line, recall that `ρ2` contains concrete selectors, which,
//: when evaluated with `s` bound to `s`, will return the value of `ρ2` is.
//: Hence, if `i∗ = [i0, · · · , ik]`, then the list expression will evaluate
//: correctly to `[ρ2is | i ∈ i∗]`.
//:
//: This version of the target code includes an explicit call to
//: `decompose-values`. It is possible to apply the same methods to
//: `decompose-values`, using staging to take advantage of the fact that its
//: first argument is known. The resulting system generates target code that
//: includes a local letrec loop in place of the `decompose-values`, and in
//: which the run-time level tests are eliminated. We leave this development as
//: an exercise for the reader.
//: # Results
//: The derivation, including false starts and debugging of the resulting code,
//: took well under one man-week. The production compiler, by contrast, embodies
//: several man-months of work.
//:
//: How good is the resulting compiler? It seems to produce code which is
//: comparable with the production version. For let, it produces the following
//: code for the transcription:
//:
//:
//: (lambda (s)
//: (cons
//: (cons ’lambda
//: (cons (map (lambda (s) (car s))
//: (car (cdr s)))
//: (cdr (cdr s))))
//: (map (lambda (s) (car (cdr s)))
//: (car (cdr s)))))
//:
//: which is clearly equivalent (given a reasonably optimizing compiler) to the
//: production version code given above. It also produces comparable code for
//: the tests. In view of this performance, we regard the derivation as a
//: success.
//: # Conclusions
//: We have presented a "macro-by-example" facility for Lisp-like languages. The
//: facility allows the user to specify syntactic extensions in a natural,
//: non-procedural manner. These specifications are expanded into
//: transformations which automatically perform pattern-matching,
//: error-checking, and mapping. We have given a formal semantics for the
//: specification language and have shown how the semantics can be converted
//: into a compiler by the use of staging and suitable choices of
//: representations for the semantic functions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment