Skip to content

Instantly share code, notes, and snippets.

# klmr/generator.md Last active Apr 1, 2019

Python-like generators in R

# A little experiment using restarts.

## (And while we’re at it, let’s torture R’s syntax a little.) In the following we will be using R’s “restarts” feature to implement the state machine that drives generators in languages such as Python. Generators allow lazily generating values on demand: a consumer invokes a generator, and consumes values as they are produced. A new value is only produced once the previous one has been consumed.

Decoupling consumer and producer in this fashion is a powerful technique, because a consumer often has a better idea of how many values are needed than the generator. This means that a consumer can interrupt the generation of more values.

One consequence of this is that generators can be infinite; for instance, here is a potential implementation of a Fibonacci generator in Python:

```def fib():
a, b = 0, 1
while True:
yield b
a, b = b, a + b```

Now if we want to print the first ten Fibonacci numbers we can use the generator as follows:

```i = 0
for n in fib():
i += 1
if i > 10: break
print n```

Neat! So what’s stopping us from doing this in R? Well, a first answer is: the lack of `yield`, which yields execution back to the caller but then resumes the function execution until the function either reaches its end, or the caller aborts it.

A second answer is: nothing!

In fact, we can implement `yield` in less than ten lines of code if we’re willing to cheat (heck, are we!).

```yield = function (item) {
do_yield = function (item, expr, index_name, parent) {
item = setNames(list(item), index_name)
expr = do.call(substitute, list(expr, item))
eval(expr, parent)
}

yield_condition = structure(list(value = item), class = c('yield', 'condition'))
withRestarts(stop(yield_condition), do_yield = do_yield)
}```

?!?

This function essentially needs to be read bottom-to-top: it `stop`s execution with a specific condition (now’s a good time to read up on condition handling). However, this happens within a `withRestarts`. Don’t bother reading its documentation; it’s incomprehensible. In a nutshell, this function allows us to specify “restarts” — that is, get-out-of-jail-free cards that are used higher up by the user to resume execution after a condition was signalled. Such a restart is defined here as `do_yield` (for lack of a better name …).

What does this restart-function do, and where is it called from? Glad you asked …. But first we need to define a bit of syntactic sugar. The end goal is to use `yield` just as we would use it in Python; for example, in a `for` loop. We’ll keep it simpler for now and define our own loop-like construct:

`foreach (i, generator(), message('Hello ', i))`

It’s this `foreach` which will have to ensure that execution is restarted after the `yield` condition was signalled. Without further ado:

```foreach = function (index, generator, expr) {
# Don’t evaluate anything (just yet):
index_name = as.character(substitute(index))
generator = substitute(generator)
expr = substitute(expr)

# And remember where we’re calling from:
parent = parent.frame()

# Here the magic happens.
yielder = function (e)
invokeRestart('do_yield', e\$value, expr, index_name, parent)

withCallingHandlers(eval(generator, parent), yield = yielder)
}```

And let’s test it real quick:

```generator = function () {
for (i in 1 : 5)
yield(i)
}

foreach (i, generator(), cat('Hello', i, '\n'))```
``````## Hello 1
## Hello 2
## Hello 3
## Hello 4
## Hello 5
``````

It works!

So: what’s going on here? `foreach` essentially evaluates the `generator` expression (which, in our example, is a call to the `generator` function). But rather than evaluating it directly, it does so “with calling handlers”:

“Here’s an expression that we’d like to execute,” we tell R. “And just in case this raises a `yield` condition,” we continue, “please execute our `yielder` function.” A lot of jumping around ensues: the `generator` function gets called, which in turn calls `yield`. `yield` signals a `yield` condition by calling `stop`. And `foreach` reacts to this `yield` condition by dispatching to the `yielder`. The `yielder` now invokes the restart that we had previously defined just before calling `stop` (remember?). That restart, called, `do_yield`, gets passed a bunch of parameters:

• the current value, which we have stored inside the condition;
• the expression (the “loop body”) to execute for each value;
• the name of the index variable in our loop; and finally,
• the caller’s environment, in which to execute the loop body.

We have now come full circle: `do_yield` takes the loop body expression and substitutes the name of the index variable with its current value. In our example, it takes the unevaluated expression `cat('Hello', i, '\n')` and transforms it into `cat('Hello', 1, '\n')` (obviously that’s for the first iteration; subsequently the values 2, … are substituted). And finally, it evaluates the so formed expression in the caller’s environment. The end.

There remains one thing to do: make this work with R’s own `for` loop. `for`, like almost everything else in R, is simply a function. And we can totally redefine it. Oh yes.

````for` = function (var, seq, expr) {
tryCatch(eval.parent(substitute(foreach(var, seq, expr),
list(var = substitute(var),
seq = substitute(seq),
expr = substitute(expr)))),
`break` = function (e) invisible())
invisible()
}

# Another small hack to make `break` work as expected.
makeActiveBinding('break',
function () stop(structure(list(), class = c('break', 'condition'))),
environment())```
```for (i in generator())
cat('Hello for', i, '\n')```
``````## Hello for 1
## Hello for 2
## Hello for 3
## Hello for 4
## Hello for 5
``````

(Granted, I cheated again: redefining `for` as above makes it unusable in the conventional way, so I had to redefine the `generator` function to use `lapply` instead of `for`. It’s still `yield`ing though.)

And finally, here’s the `fib` example we started with — an infinite generator of Fibonacci numbers:

```fib = function () {
x = c(0, 1)
repeat {
yield(x)
x = c(x, sum(x))
}
}

i = 0
for (n in fib()) {
if ((i = i + 1) > 10) break
cat(n, '\n')
}```
``````## 1
## 1
## 2
## 3
## 5
## 8
## 13
## 21
## 34
## 55
``````

I mentioned initially that our implementation is cheating. In fact, a Python generator can be invoked and stored in a variable for later consumption, like so:

`x = fib()`

Alas, attempting this with our iterator won’t work:

`x = fib()`
``````## Error in stop(yield_condition): bad error message
``````

As we haven’t installed a handler for `yield` here, the code breaks. Fixing this is beyond the scope of this article (it requires a different approach).

To understand more about how restarts in R work, head over to the Advanced R chapter Beyond exception handling: conditions and restarts.

And if you find the code above entertaining, here are some more projects:

License: CC-BY • Author: @klmr

### dirmeier commented Feb 2, 2018

 This is an awesome gist.

### isezen commented May 10, 2018

 Thanks for the article. Although it's an experiment, what about speed? is it promising?
Owner Author

### klmr commented May 14, 2018

 @isezen I haven’t checked. To be honest I would recommend against actually using this — the fact that you can’t store generators in variables renders this brittle.
to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.