Skip to content

Instantly share code, notes, and snippets.

@Araq
Last active May 30, 2024 13:00
Show Gist options
  • Save Araq/7f1eaacb68a5c49f73bc to your computer and use it in GitHub Desktop.
Save Araq/7f1eaacb68a5c49f73bc to your computer and use it in GitHub Desktop.

This document describes how Lambda Lifting (LL) works. LL is responsible for transforming 'closures' into (environment, function) pairs.

The basic approach is that captured vars need to be put on the heap and that the calling chain needs to be explicitly modelled. Things to consider:

proc a =
  var v = 0
  proc b =
    var w = 2

    for x in 0..3:
      proc c = capture v, w, x
      c()
  b()

  for x in 0..4:
    proc d = capture x
    d()

Needs to be translated into:

proc a =
  var cl: *
  new cl
  cl.v = 0

  proc b(cl) =
    var bcl: *
    new bcl
    bcl.w = 2
    bcl.up = cl

    for x in 0..3:
      var bcl2: *
      new bcl2
      bcl2.up = bcl
      bcl2.up2 = cl
      bcl2.x = x

      proc c(cl) = capture cl.up2.v, cl.up.w, cl.x
      c(bcl2)

    c(bcl)

  b(cl)

  for x in 0..4:
    var acl2: *
    new acl2
    acl2.x = x
    proc d(cl) = capture cl.x
    d(acl2)

Closures as interfaces:

proc outer: T =
  var captureMe: TObject # value type required for efficiency
  proc getter(): int = result = captureMe.x
  proc setter(x: int) = captureMe.x = x

  result = (getter, setter)

Is translated to:

proc outer: T =
  var cl: *
  new cl

  proc getter(cl): int = result = cl.captureMe.x
  proc setter(cl: *, x: int) = cl.captureMe.x = x

  result = ((cl, getter), (cl, setter))

For 'byref' capture, the outer proc needs to access the captured var through the indirection too. For 'bycopy' capture, the outer proc accesses the var not through the indirection.

Possible optimizations:

1) If the closure contains a single 'ref' and this reference is not re-assigned (check sfAddrTaken flag) make this the closure. This is an important optimization if closures are used as interfaces. 2) If the closure does not escape, put it onto the stack, not on the heap. 3) Dataflow analysis would help to eliminate the 'up' indirections. 4) If the captured var is not actually used in the outer proc (common?), put it into an inner proc.

Things to do

Don't map PSym to PEnv, but TLineInfo to PEnv

Bug #705 seems to be the easiest to tackle:

proc foo(x = "hello") =
  proc bar() = echo(x & " world")
  proc baz1() = bar()
  proc baz2() = bar()
  baz1()
  baz2()

The problem is that the nim compiler uses this data structure to attach closure creation to the AST:

type
  TOuterContext = object
    ...
    lambdasToEnv: TIdTable # PSym->PEnv mapping

(That should rather use a proper generic hash table from the stdlib, but I digress.)

So it maps symbols to the environment that needs to be passed around but this is clearly wrong, because the symbol 'bar' in the example is called from 2 different contexts. So instead of storing a PSym->PEnv mapping a TLineInfo->PEnv mapping should be used.

Try to re-enable type based expression construction

Some bugs, in particular bugs related to complex nested closure iterators might be fixed by re-enabling this code fragment:

when false:
  # Type based expression construction works too, but turned out to hide
  # other bugs:
  ...

Right now the code instead uses a while it != nil: ...; it = it.up loop which relies on the chaining of the different nested environments.

Disallow top level .closure procs

Some of the complexity in LL stems from the fact that we allow top level .closure procs even though they make not much sense. A top level closure proc is always called via a constructed (env, func) pair where 'env' is 'nil'.

This means that there is a distinction between tfCapturesEnv in sym.typ.flags and sym.typ.callConv == ccClosure which further complicates matters. Instead top level .closure procs should simply be disallowed and the code be simplified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment