Skip to content

Instantly share code, notes, and snippets.

@jozefg
Last active March 5, 2024 11:52
Show Gist options
  • Star 44 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save jozefg/652f1d7407b7f0266ae9 to your computer and use it in GitHub Desktop.
Save jozefg/652f1d7407b7f0266ae9 to your computer and use it in GitHub Desktop.
Tutorial on Closure Conversion and Lambda Lifting
This is my short-ish tutorial on how to implement closures in
a simple functional language: Foo.
First, some boilerplate.
> {-# LANGUAGE DeriveFunctor, TypeFamilies #-}
> import Control.Applicative
> import Control.Monad.Gen
> import Control.Monad.Writer
> import Data.Functor.Foldable
To actually run this code, you'll need `Control.Monad.Gen`, which is
part of my monad-gen package. It's just a thin wrapper over the state
monad to give us fresh variable names later on.
Foo will consistent of a few core primops, numbers, and lambdas. In Haskell,
we can represent Foo expressions as
> data Var = Gen Integer | Name String -- Gen for generated variables
> deriving(Eq, Show)
>
> data Primop = Plus | Sub | Mult | Div
> deriving Show
>
> data Exp = Var Var
> | App Exp Exp
> | Lam [Var] Exp
> | Prim Primop
> | Lit Int
> deriving Show
We allow lambdas to have multiple arguments, but we're treating even
them as sugar for the manually curried form, like we do in Haskell.
Now, let's write a little boilerplate to use recursion-schemes.
> data ExpF a = VarF Var
> | AppF a a
> | LamF [Var] a
> | PrimF Primop
> | LitF Int
> | PAppF Primop a a
> deriving Functor
> type instance Base Exp = ExpF
> instance Foldable Exp where
> project (Var v) = VarF v
> project (App l r) = AppF l r
> project (Lam vs e) = LamF vs e
> project (Prim p) = PrimF p
> project (Lit i) = LitF i
> instance Unfoldable Exp where
> embed (VarF v) = Var v
> embed (AppF l r) = App l r
> embed (LamF v e) = Lam v e
> embed (PrimF p) = Prim p
> embed (LitF i) = Lit i
Now, programs in Foo are a series of mutually recursive definitions,
just like Haskell. To represent this, we'll write a new type
> data Def = Def Var [Var] Exp
> deriving Show
Now, on to closure conversion!
As a brief refresher, a closure is a lambda with references to
variables outside of its scope that aren't global variables.
If you write Haskell, then you use closures all the time! In fact
every multiple argument function is closures.
For example
> add x y = x + y
This is really
> add' = \x -> \y -> x + y
Now look at the inner lamda `\ y -> x + y`. here the reference to
x isn't bound in the lambda's argument list, and it's not defined on
the top level. Closures give rise to currying, partial application, and
can even be used to fake message passing stateful objects!
Now the problem is, almost all compilation targets don't support
closures! Or nested functions at all for that matter. Therefore,
one of the many things a compiler for Foo most do is convert all
closures and anonymous functions into normal, top level functions.
To do this, each lambda needs to be retrofitted to take a bunch of extra
arguments. For example our `add` from would become something like
> add'' = \x -> (\x y -> x + y) x
Now each lambda is "closed" and contains no reference to outside variables.
The next step will be to lift our inner lambda to its own top level definition
so that translation to something like C, STG, or LLVM IR would be pretty straight
forward.
> extra = \x y -> x + y -- Ignore the fact that this desugars to add'' for a moment
> add''' = \x -> extra x
Obviously, this is not optimal code! We'll talk about how to optimize this later.
For now though, we can write a simple algorithm for closure conversion and lambda
lifting.
First some helper functions and types
> without :: Eq a => [a] -> [a] -> [a]
> without = foldr (filter . (/=)) -- Like \\ but removes all occurrences
>
> freeVars :: Exp -> [Var] -- Grab all the unbound variables in an expression
> freeVars = cata folder
> where folder (VarF v) = [v]
> folder (AppF l r) = l ++ r
> folder (LamF vs e) = e `without` vs
> folder (PrimF _) = []
> folder (LitF _) = []
>
> applyTo :: Exp -> [Var] -> Exp
> applyTo e (a : as) = applyTo (App e $ Var a) as
> applyTo e [] = e
Now, the actual closure conversion pass is quite simple
> closConv :: [Var] -> Exp -> Exp
> closConv globals = cata folder
> where folder (LamF vs e) =
> let vars = freeVars e `without` (globals ++ vs)
> in Lam (vars ++ vs) e `applyTo` vars
> folder e = embed e
Yep, that's it! Now we can take a Foo expression like
> -- testExp foo = foo (\bar -> bar foo)
> testExp = Lam [foo] $ App (Var foo) (Lam [bar] $ App (Var bar) (Var foo))
> where [foo, bar] = map Name ["foo", "bar"]
And convert it to an expression like
> -- converted foo = foo ((\foo bar -> bar foo) foo)
> converted :: Exp
> converted = closConv [] testExp
Next we can lift everything to its own top level. This
is another small pass.
> type ClosM = WriterT [Def] (Gen Integer)
> liftLam :: Exp -> ClosM Exp
> liftLam = cata folder
> where folder (AppF l r) = App <$> l <*> r
> folder (VarF v) = return $ Var v
> folder (PrimF p) = return $ Prim p
> folder (LitF i) = return $ Lit i
> folder (LamF vs e) = do
> fresh <- Gen <$> gen
> Def fresh vs <$> e >>= tell . return
> return $ Var fresh
Now we can chain these two steps together
> eliminateLams :: [Var] -> Def -> [Def]
> eliminateLams globals (Def nm vs e) = Def nm vs e' : defs
> where (e', defs) = runGenInt . runWriterT . liftLam $ closConv globals e
>
And that's it!
Now, one simple optimization we can do during this whole process is to mash lambdas
together. Currently, if we had something like
> slow = Lam [Name "a"] $ Lam [Name "b"] $ undefined
This could get lifted into two separate, equivalent, top levels! We saw this
before with `add`. To fix this we can traverse each expression, and lambdas
like these together. This can be done with one simple pass
> smash :: Exp -> Exp
> smash = cata folder
> where folder (LamF vs (Lam vs' e)) = Lam (vs ++ vs') e
> folder e = embed e
If we plug this into `eliminateLam`, we end up with much more efficient
generated code.
In fact, there's a whole host of clever tricks to play during closure
conversion and lambda lifting, especially in lazy languages.
For more on this, check out SPJ's [book]
(http://research.microsoft.com/en-us/um/people/simonpj/papers/slpj-book-1987/)
@funrep
Copy link

funrep commented Jun 19, 2014

Maybe I'm mistaken, but shouldn't

let vars = freeVars e `without` (globals ++ vars)

be...

let vars = freeVars e `without` (globals ++ vs)

?

Thanks for the tutorial (really need to learn about those recursion-schemes, I think its exactly something like that I've been longing for while working on my closure conversion based on below blog post)! For reference here's some other useful resources I've found:

http://matt.might.net/articles/closure-conversion/
and it's also covered in Appel's book (Compiling with continuations) (which you suggested :P, not read that particular chapter yet though) (but in CPS form I think?) and there's also fairly readable implementation of it in Scheme in CHICKEN's source file compiler.scm, function perform-closure-conversion. May I ask, if its correct that Appel's approach is to closure convert AFTER CPSing, is there any down-sides/up-sides you know of?

@jozefg
Copy link
Author

jozefg commented Jun 21, 2014

Quite right about vars vs vs.

As far as CwC goes, I think you have to closure convert/lambda lift after CPS since CPS generates a ton of closures! A smarter thing to do might be to CPS generate, and run a pass over the code and convert

(\x' -> ...) x

into something like

magical_let x' = x in ....

And then do closure conversion and lift. Since otherwise there's a world of pain involved in making CPSed code not generate tons of lambdas.

@funrep
Copy link

funrep commented Jun 21, 2014

Makes sense, thanks.

@valmirjunior0088
Copy link

Hello!

I'm implementing a compiler from a functional language to WebAssembly and this gist has been extremely helpful!

Though I'm still a little bit lost. Do you perhaps have some resource you could point me towards on how to deal with the case when a function takes another function as an argument (as in a higher order function e.g. map), or when a function returns a function (as in a partial application)?

As I understand, the usual method is to pack a function pointer (possibly from a converted and lifted closure) alongside an environment of values that have been captured/applied. Is that correct?

@dmjio
Copy link

dmjio commented Aug 29, 2021

@jozefg,

Shouldn't

where (e', defs) = runGenInt . runWriterT . liftLam $ closConv globals e

be

where (e', defs) = runGenInt . runWriterT . liftLam $ closConv (globals <> vs) e

i.e (closConv (globals <> vs) e) top level variables in a Def should be considered bound. Otherwise freeVars will return incorrectly.

@Hirrolot
Copy link

> extra  = \x y -> x + y -- Ignore the fact that this desugars to add'' for a moment
> add''' = \x   -> extra x

I got lost a bit after this moment. If extra is not a curried function but a truly multi-parameter one, why are we applying only a single x to it in add'''? If extra is really a syntax sugar for its curried version, then \x . \y . x + y (the curried extra) must be also transformed into \x . something x, where something is again \x . \y . x + y???

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment