Skip to content

Instantly share code, notes, and snippets.

@klmr
Last active August 28, 2022 02:26
Show Gist options
  • Star 33 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save klmr/d10623a0b4c7e1e9a6523eebee4913d1 to your computer and use it in GitHub Desktop.
Save klmr/d10623a0b4c7e1e9a6523eebee4913d1 to your computer and use it in GitHub Desktop.
Python-like generators in R

A little experiment using restarts.

(And while we’re at it, let’s torture R’s syntax a little.)

screenshot

In the following we will be using R’s “restarts” feature to implement the state machine that drives generators in languages such as Python. Generators allow lazily generating values on demand: a consumer invokes a generator, and consumes values as they are produced. A new value is only produced once the previous one has been consumed.

Decoupling consumer and producer in this fashion is a powerful technique, because a consumer often has a better idea of how many values are needed than the generator. This means that a consumer can interrupt the generation of more values.

One consequence of this is that generators can be infinite; for instance, here is a potential implementation of a Fibonacci generator in Python:

def fib():
    a, b = 0, 1
    while True:
        yield b
        a, b = b, a + b

Now if we want to print the first ten Fibonacci numbers we can use the generator as follows:

i = 0
for n in fib():
    i += 1
    if i > 10: break
    print n

Neat! So what’s stopping us from doing this in R? Well, a first answer is: the lack of yield, which yields execution back to the caller but then resumes the function execution until the function either reaches its end, or the caller aborts it.

A second answer is: nothing!

In fact, we can implement yield in less than ten lines of code if we’re willing to cheat (heck, are we!).

yield = function (item) {
    do_yield = function (item, expr, index_name, parent) {
        item = setNames(list(item), index_name)
        expr = do.call(substitute, list(expr, item))
        eval(expr, parent)
    }

    yield_condition = structure(list(value = item), class = c('yield', 'condition'))
    withRestarts(stop(yield_condition), do_yield = do_yield)
}

?!?

This function essentially needs to be read bottom-to-top: it stops execution with a specific condition (now’s a good time to read up on condition handling). However, this happens within a withRestarts. Don’t bother reading its documentation; it’s incomprehensible. In a nutshell, this function allows us to specify “restarts” — that is, get-out-of-jail-free cards that are used higher up by the user to resume execution after a condition was signalled. Such a restart is defined here as do_yield (for lack of a better name …).

What does this restart-function do, and where is it called from? Glad you asked …. But first we need to define a bit of syntactic sugar. The end goal is to use yield just as we would use it in Python; for example, in a for loop. We’ll keep it simpler for now and define our own loop-like construct:

foreach (i, generator(), message('Hello ', i))

It’s this foreach which will have to ensure that execution is restarted after the yield condition was signalled. Without further ado:

foreach = function (index, generator, expr) {
    # Don’t evaluate anything (just yet):
    index_name = as.character(substitute(index))
    generator = substitute(generator)
    expr = substitute(expr)

    # And remember where we’re calling from:
    parent = parent.frame()

    # Here the magic happens.
    yielder = function (e)
        invokeRestart('do_yield', e$value, expr, index_name, parent)

    withCallingHandlers(eval(generator, parent), yield = yielder)
}

And let’s test it real quick:

generator = function () {
    for (i in 1 : 5)
        yield(i)
}

foreach (i, generator(), cat('Hello', i, '\n'))
## Hello 1 
## Hello 2 
## Hello 3 
## Hello 4 
## Hello 5

It works!

So: what’s going on here? foreach essentially evaluates the generator expression (which, in our example, is a call to the generator function). But rather than evaluating it directly, it does so “with calling handlers”:

“Here’s an expression that we’d like to execute,” we tell R. “And just in case this raises a yield condition,” we continue, “please execute our yielder function.” A lot of jumping around ensues: the generator function gets called, which in turn calls yield. yield signals a yield condition by calling stop. And foreach reacts to this yield condition by dispatching to the yielder. The yielder now invokes the restart that we had previously defined just before calling stop (remember?). That restart, called, do_yield, gets passed a bunch of parameters:

  • the current value, which we have stored inside the condition;
  • the expression (the “loop body”) to execute for each value;
  • the name of the index variable in our loop; and finally,
  • the caller’s environment, in which to execute the loop body.

We have now come full circle: do_yield takes the loop body expression and substitutes the name of the index variable with its current value. In our example, it takes the unevaluated expression cat('Hello', i, '\n') and transforms it into cat('Hello', 1, '\n') (obviously that’s for the first iteration; subsequently the values 2, … are substituted). And finally, it evaluates the so formed expression in the caller’s environment. The end.

There remains one thing to do: make this work with R’s own for loop. for, like almost everything else in R, is simply a function. And we can totally redefine it. Oh yes.

`for` = function (var, seq, expr) {
    tryCatch(eval.parent(substitute(foreach(var, seq, expr),
                                    list(var = substitute(var),
                                         seq = substitute(seq),
                                         expr = substitute(expr)))),
             `break` = function (e) invisible())
    invisible()
}

# Another small hack to make `break` work as expected.
makeActiveBinding('break',
                  function () stop(structure(list(), class = c('break', 'condition'))),
                  environment())
for (i in generator())
    cat('Hello for', i, '\n')
## Hello for 1 
## Hello for 2 
## Hello for 3 
## Hello for 4 
## Hello for 5

(Granted, I cheated again: redefining for as above makes it unusable in the conventional way, so I had to redefine the generator function to use lapply instead of for. It’s still yielding though.)

And finally, here’s the fib example we started with — an infinite generator of Fibonacci numbers:

fib = function () {
    x = c(0, 1)
    repeat {
        yield(x[2])
        x = c(x[2], sum(x))
    }
}

i = 0
for (n in fib()) {
    if ((i = i + 1) > 10) break
    cat(n, '\n')
}
## 1 
## 1 
## 2 
## 3 
## 5 
## 8 
## 13 
## 21 
## 34 
## 55

I mentioned initially that our implementation is cheating. In fact, a Python generator can be invoked and stored in a variable for later consumption, like so:

x = fib()

Alas, attempting this with our iterator won’t work:

x = fib()
## Error in stop(yield_condition): bad error message

As we haven’t installed a handler for yield here, the code breaks. Fixing this is beyond the scope of this article (it requires a different approach).

To understand more about how restarts in R work, head over to the Advanced R chapter Beyond exception handling: conditions and restarts.

And if you find the code above entertaining, here are some more projects:


License: CC-BY • Author: @klmr

@dirmeier
Copy link

dirmeier commented Feb 2, 2018

This is an awesome gist.

@isezen
Copy link

isezen commented May 10, 2018

Thanks for the article. Although it's an experiment, what about speed? is it promising?

@klmr
Copy link
Author

klmr commented May 14, 2018

@isezen I haven’t checked. To be honest I would recommend against actually using this — the fact that you can’t store generators in variables renders this brittle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment