dherman/iterators.md

## iterators.md

      
    Raw
  

              iterators.md
            
          
    Iterators, generators, and loop exit

Understanding the iterable/iterator distinction

It's important to understand what the purpose of the two abstractions are in order to understand the rest of this essay.
An iterable is a data source, often a mutable one, that can at arbitrary points be iterated over, especially by a for-of loop. An iterable object can have an arbitrary lifetime, and may be long-lived. But iterating over an iterable data structure should be thought of as a relatively short-lived operation, in the sense that the data source should be temporarily frozen during iteration---modifying the iterable data source in the middle of an iteration will generally result in unpredictable behavior.
An iterator produced by an iterable object is therefore a relatively short-lived object, which should not be used once the iteration operation is complete.
It is possible for an iterator to represent an infinite stream, or for an iterator to iterate over an immutable data structure, and therefore to be safe to keep alive indefinitely. However because iterating over mutable data sources is so common, the language should be designed to expect iterators to be short-lived in the common case.
Generators should not be both iterators and iterables

It's somewhere between a hack and a violation of the intended meaning of the iterable/iterator abstractions to produce iterators that claim to be iterable by always returning themselves as the result of calling @@iterator. The @@iterator method is supposed to produce a fresh iterator over the current state of an iterable data source, and that iterator should be rewound to its beginning.
(Moreover, it's a pretty ridiculous brain-teaser.)
So it's really not ideal for generator functions to produce generators that are both iterable and iterators. Nevertheless, we really want for-of loops to accept iterable objects as well as generators, so that we can have intuitive syntax like:
for (let entry of map) { ... } // iterable
for (let element of array) { ... } // iterable
for (let node of tree.leaves()) { ... } // generator
So the way to continue to have compatible semantics without generators being iterable is for the semantics of for-of to be polymorphic. The semantics should be:

Extract the @@iterator property. If the result is not undefined, call it as a method and use the result as the iterator.
Otherwise, assume the object itself is an iterator and use that as the iterator.

Why generator functions return iterators

It's tempting to change the semantics of generator functions to produce an iterable instead of an iterator, since the iterable could possibly serve the role of the initial suspension of the function body and we could consider eliminating the implicit yield. However, this has two problems. First, eliminating the implicit yield causes the first yielded value to be ignored:
function* foo() {
  yield "first value";
}
// creates generator but does not start executing
let iterable = foo();
// runs to first yield, drops "first value" into the ether
let iterator = iterable[Symbol.for("iterator")]();
Secondly, implicit yield or no implicit yield, a generator function produces a sequence that has no backing data source, so it's simply nonsensical to produce an iterable object that can produce multiple iterators (through multiple calls to @@iterator). It's impossible to rewind the state of the sequence, since the generator function is a closure that can cause arbitrary mutations to the heap. And it would be silly to just keep producing the same iterator. The problem is that an iterable is simply the wrong abstraction. A generator function produces a sequence, not an iterable data source.
Iterators should have a cancellation method

Given that iteration should be short-lived, and for-of is the attractive syntax for performing an iteration, a for-of loop should always force an iterator to complete. That is, programmers should have to consider an iterator to be "used up" after exiting a for-of loop.
In particular, if a generator has cleanup code, an abrupt exit from a for-of loop should not require programmers to somehow force the generator to complete (e.g. by calling .next() repeatedly until it finishes, or by calling .throw() manually). However, there needs to be some way for a for-of loop to be able to signal to the iterator that the loop is terminating early and the iterator should shut itself down.
An optional cancellation method is the answer for this. It should be optional so that simple generators that don't need any cleanup can still be implemented with a single method, and to avoid the overhead of a method call when it's not necessary.
Note that the invariant that for-of always expends its iterator does not prevent the reuse of e.g. infinite iterations. It's easy to create combinators that do not close their wrapped iterator when they are closed. For example:
function* take(n, iterator) {
  for (let i = 0; i < n; i++) {
    yield iterator.next();
  }
}
let ps = primes();
for (let x of take(100, ps)) { ... }
for (let x of take(100, ps)) { ... }
...
Another reason for having for-of cancel its iterator is that we should be thinking ahead to asynchronous versions of the syntax that could be used for asynchronous streams, such as
async function f(...) {
  ...
  for (await x of node.clicks) {
    ...
    break;
    ...
  }
}
In these cases we will definitely need to be able to unsubscribe from the streams. Or similarly, we may have features that allow iteration over asynchronous data sources, also with abrupt termination:
async function* lines(filename) {
  let fd = await openFile(filename);
  try {
    let line;
    while ((line = await readLine(fd)) !== null) {
      yield line;
    }
  } finally {
    closeFile(fd);
  }
}

async function f() {
  let i = 0;
  for (await line of lines("file.txt")) {
    if (i >= 100) {
      break;
    }
    i++;
    ...
  }
  ...
}
Given that cancellation seems necessary for the asynchronous case, the synchronous case ought to be symmetric. Of course, it's hard to predict the future, but the symmetry suggests that this could be an issue for synchronous iteration, in that before/after cleanup code (especially via finally) can certainly occur.
A quick point on naming: TC39 discussed calling this method return instead of close. One argument against this is that return is a concept that speaks to the internals of a generator function, whereas this is a part of the external interface of all iterators. That suggests something like cancel, stop, or dispose. But since iterators have a result value, maybe return is in fact a reasonable concept even at the iterator abstraction layer.
There are a couple of subtle remaining questions to be answered if we are going to add cancellation.
Should the cancellation method be able to take a result value?

Since generator functions can return a value, and the cancellation method forces a return, it must return undefined by default. But should it allow passing an optional result value? Should we be bothered by the fact that external consumers of an iterator are able to override its return value?
My current thinking is that being able to cancel an iteration unavoidably carries with it this power to override the result, so in fact it should be able to pass a result value. And I guess this is softening me up on the name return.
Should for-of use cancellation only on abrupt completion?

It seems unnecessary for for-of to use the cancellation method when a loop terminates normally, since the iterator itself made the decision that the iteration was done and in that case is perfectly capable of doing its own cleanup. And a generator function that uses try-finally for cleanup will work just fine since the finally executes regardless of whether the try block completes normally or abruptly. We should avoid needlessly throwing in extra method calls to the semantics for common cases where they aren't needed.
However, it's a little weird to have a semantics that only invokes the method based on abrupt completion, since this depends on a spec-internal distinction. In particular, it makes it tricky to compiler for-of to ES5 since it's hard to encode that distinction.
I favor cancellation being called only on abrupt completion, especially because it doesn't make sense to return multiple times from a generator. (In fact it should throw an error if code calls the cancellation method after a generator has already completed.) But it would be good to think about how this could be implemented in regenerator.
Should generators be able to yield after cancellation?

If canceling a generator resumes its suspension point as if it had been a return statement, we have to deal with the fact that it's possible for a generator to yield afterwards. This could happen inside a finally block:
function* f() {
  ...
  try {
    ... yield ...
  } finally {
    ... yield ...
  }
}
But it can also happen in unpredictable ways because a finally block can cancel a return:
function* f() {
  try {
    try {
      ... yield ...
    } finally {
      throw "overridden"
    }
  } catch (ignored) { }
  ... yield ...
}
This gets even more unpredictable when you consider the fact that yield* allows the finally to be hidden in an external generator function, and in fact the error in the finally block could be itself thrown from an external function:
function* f() {
  ...
  try {
    ... yield* delegate() ...
  } catch (ignored) { }
  ... yield ...
}

function* delegate() {
  ...
  try {
    ... yield ...
  } finally {
    ... helper() ...
  }
  ...
}

function helper() {
  ... throw ...
}
This leaves us with a few options for the semantics of yielding after cancellation.
(A) Dynamic error

If we make it a dynamic error for yield to happen after a generator has been externally cancelled, then we break the equivalence to an internal return, and we potentially make it difficult to know how to write correct generator functions under the threat of being cancelled at arbitrary yield points. This might be OK, but we have to articulate the programming methodology that avoids the hazard in generator functions:

Don't yield during or after a finally: This is unrealistic since you can't tell if a delegate generator function might internally use finally.
Don't return or throw from a finally that ever yields in its try: This is just too complex to learn.
Don't return or throw from a finally: This might be the best rule.

(B) Legal but ignored

If we don't make it a dynamic error to yield after cancellation, then cancellation remains equivalent to return. However, this means that the iterator resumes and produces another iteration record { value: v, done: false }. If the return method simply ignores this iteration record and produces undefined, there's no way for the consumer to know that the iterator has continued iterating.
(C) Legal and returned but ignored by for-of

Alternatively, we can simply have the return method return the next iteration record, which will indicate via its done property whether the iterator completed or not. This seems more reasonable. However, it leaves open the question of what the for-of loop should do in this case. For example:
function *foo() {
  try {
    try {
      yield 1;
    } finally {
      throw "overridden";
    }
  } catch (ignored) { }
  yield 2;
  console.log("goodbye");
}

for (let x of foo()) {
  console.log(x);
  if (x === 1)
    break;
}
This runs only one iteration of the loop and prints 1, but never prints 2 and never prints "goodbye". This seems reasonable, but it does mean that the cancellation method cannot guarantee that the iterator actually disposes of itself. And it still requires articulating a methodology as described in option (A): a generator needs to be prepared for the possibility that any yield point could force a return, so finally should not jeopardize that behaving appropriately.
(D) Legal and returned and resumes for-of

Alternatively, if the iterator refuses to terminate, the for-of loop could continue executing. This seems completely bonkers, because it means an iterator can turn a break into a continue.
(E) Legal and returned but turned into an error by for-of

This is a slight variation on (A). It preserves the equivalence to a return statement, but in practice mostly ends up with the same errors happening since for-of throws if the iterator refuses to die.
My preference: (E)

Part of my reasoning is that return, throw, and next are really all variations on the same concept for a generator: resume the suspension point with a value, but with one of three different control flow signals. So each of the three methods should have the same signature:
function(any = undefined) -> { value: any, done: boolean }
In fact, for contexts other than for-of, there's no reason to disallow a generator from yielding at arbitrary points. But when a for-of loop decides to exit the loop abruptly, it's a bug if the iterator can't stop. The best thing to do at that point is throw a fast error indicating that the loop was unable to properly dispose of the iterator.
Conclusion

To sum up, the changes to make to ES6 generators are:

Generators are not both iterables and iterators. Generator objects are only iterators; they don't have an @@iterator property that returns this.
for-of loops accept both iterables and iterators. They first try to get @@iterator and failing that assume the object is itself an iterator.
Iterators can have an optional return method. The return method takes an optional value (defaults to undefined) and signals to the iterator that it should finish. The iterator is expected to return an iteration record. For generators, the next, throw, and return methods of a generator all have the same signature, which takes an optional resumption value (defaults to undefined) and returns an iteration record.
For-of loops use the return method on abrupt termination. If for-of terminates abruptly, it calls return on the iterator. If the iterator does not return an object whose done property is truthy, the loop throws an exception.