thomaswilburn/re-iterate.md

## re-iterate.md

      
    Raw
  

              re-iterate.md
            
          
    Re: Iterate

I'm so old I remember when arrays in JavaScript didn't have forEach() or map(), and a lot of libraries would implement their own functional looping constructs, which were better than regular loops because you got a new function scope (remember, we didn't have let or const either). We all had to adjust when native forEach() landed because we were used to jQuery's each(), which had the arguments in the wrong order.
I was reminded of this while doing some scraping last week using Cheerio to load and traverse an HTML page in Node. Cheerio implements a jQuery-like API, which is genuinely pleasant to use, but carries some of that legacy behavior (like each(index, item) argument ordering) with it in ways that are jarring if you haven't used actual jQuery in a while. Fortunately, Cheerio also implements the iterator protocol on its collections, so if you just want the items for a given selector, you can use for ... of loops and not have to think about it.
Despite having written recently about generators and looping, I couldn't think of a time when I made my own iterable collection. However, in the middle of a side project this weekend, I started to implement a forEach() for a class, then stopped and made it iterable instead. It turns out it's pretty easy!
Iterator support basics

An "iterable" object in JavaScript is one that plays nice with for ... of loops, the [...spread] and fn(...rest) syntax, yield*, and destructuring--basically, any place where we consume individual items from a collection. To be considered iterable, the object has to provide a function via the Symbol.iterator property. You can declare this using a regular property assignment, but it's probably easier in modern JavaScript to add it to a class or object using the computed property syntax:
// this works, but is clunky.
var iterable = {};
iterable[Symbol.iterator] = function* {
  /* ... */
}

// this is more streamlined:
var iterable = {
  *[Symbol.iterator]() {
    /* ... */
  }
}
Technically, the function doesn't have to be a generator if it returns an object that implements a next() method in accordance with the iteration protocol, but let's be real: nobody has time for that, and generators provide it for free.
Adding an iterator to a collection

JavaScript collections have become rich enough that it's rare these days to see them re-implemented or wrapped if you can just use a regular array or map instead, or even just subclass Array to get its functionality for free. The one exception I found was Immutable, which makes sense, because the whole point is to protect you from those terribly mutation-friendly JS objects.
But let's say that we have an object that wraps a normal iterable for some reason. For example, we might have a Deck class that contains an array of cards, but also has a bunch of other methods hanging off it. The Deck doesn't just subclass Array, because it's a fundamentally different thing, but it uses an array for storage of the cards inside.
Writing an iterator that lets you for ... of through all the cards from the deck is much shorter than you probably think it is. Here's our Deck class's Symbol.iterator declaration:
*[Symbol.iterator]() {
  yield* this.cards;
}
The yield* syntax basically just substitutes one iterator for another, until the substitute runs out. So for the purposes of most interactions, the Deck is its own object, but loops and spread operators just go straight to the internal array.
However, let's say that we don't (or can't) just hand over responsibility for iteration to an already-iterable structure. For example, Cheerio (in keeping with its jQuery inspiration) is not an array and doesn't use an array for storage, even if it stores query results in numbered keys and has a length property. In that case, the iterator might look more like this:
*[Symbol.iterator]() {
  // Cheerio items are numbered keys on the object itself
  // so we'll basically just write an old-school loop
  for (var i = 0; i < this.length; i++) {
    yield this[i];
  }
}
In Cheerio's case, the underlying collection was just a flat list, so this is a little unimpressive. But it's not hard to imagine wrappers for data structures that would use a more complicated iterator. For example, what if we wanted to be able to loop through every element in a DOM tree, in depth-first order:
class Walkable {
  constructor(rootElement) {
    this.root = rootElement;
  }
  
  *[Symbol.iterator]() {
    yield this.root;
    for (var child of Array.from(this.root.children)) {
      var cwalk = new Walkable(child);
      yield* cwalk;
    }
  }
}

var ul = document.querySelector("ul");
var walker = new Walkable(ul);
for (var element of walker) {
  console.log(element); // logs every element from the UL on down
}
It's worth noting, of course, that an iterable doesn't technically have to return data from the collection. Like any generator, they can just make it up as they go along. For example, you might have a class that represents an axis on a graph, and its iterator might be used to step through each tick along the way:
class LinearAxis {
  constructor(startValue, endValue) {
    this.start = startValue;
    this.end = endValue;
  }

  *[Symbol.iterator]() {
    var { start, end } = this;
    // getInterval returns a round number that will have an appropriate number of ticks
    // see: d3.scale.nice()
    var step = this.getInterval(start, end);
    for (var i = start; i < end; i += step) {
      yield i;
    }
  }
}

var axis = new LinearAxis(1990, 2022);
for (var tick of axis) {
  console.log(tick); // logs out something like 1990, 2000, 2010, 2020
}
This ability to create stateful objects that don't themselves contain data, but still act like they do for the purposes of iteration, is where I think things get really interesting.
Iterators as abstractions

For the hobby project I'm working on, I'm duplicating some of the functionality of spreadsheets: I want to have a sparsely-populated grid of values, and be able to get/set a rectangular subset of that grid. So to start out, I wrote a class for cell addresses, in which you can feed in an Excel-style formula and get an object that has the numerical bounds of that reference:
var ref = new Reference("A2:C5");

// this returns an object with the following values set
// remember that in Excel, rows/columns are 1-indexed
ref.column == 1
ref.row == 2
ref.columns == 3
ref.rows == 4
Reference objects like this don't inherently contain any data. But if I want to use them to copy from one sheet to another, I can implement an iterator that will give me cell addresses across the selection:
*[Symbol.iterator]() {
  // again, remember that Excel is 1-indexed
  for (var y = 1; y <= this.rows; y++) {
    for (var x = 1; x <= this.columns; x++) {
      var column = x + this.column - 1;
      var row = y + this.row - 1;
      yield { column, row, x, y };
    }
  }
}
References are, like our axis earlier, an abstract iterable. They don't themselves contain data, but we can treat them like they do. With this in place, I can iterate through a reference to get both the column/row location within the sheet, as well as the column/row within the selection (x and y):
var ref = new Reference("B2:C3");
for (var cell of ref) {
  console.log(cell);
  /*
  { column: 2, row: 2, x: 1, y: 1 }
  { column: 3, row: 2, x: 2, y: 1 }
  { column: 2, row: 3, x: 1, y: 2 }
  { column: 3, row: 3, x: 2, y: 2 }
  */
}
Okay, that's kind of neat. But where it gets very cool is that since References use the same coordinate system as Sheets in my spreadsheet imitation, I can piggyback on them to provide iterators for the latter:
class Sheet {
  constructor(width, height) {
    this.columns = width;
    this.rows = height;
    this.cells = new Map();
  }

  // create a Reference that's the same size as our sheet
  selectAll() {
    var r = new Reference();
    r.rows = this.rows;
    r.columns = this.columns;
    return r;
  }

  *[Symbol.iterator]() {
    // iterate through all cells in our range
    for (var { column, row } of this.selectAll()) {
      var value = this.cells.get(column + ":" + row);
      yield { column, row, value };
    }
  }

}
Now I can iterate over a sheet:
// Sheet.paste() is left as an exercise for the reader
var sheet = new Sheet(3, 2);
sheet.paste([1, 1, 2, 3, 5, 8], "A1:C2");

for (var { value } of sheet) {
  console.log(value); // logs: 1, 1, 2, 3, 5, 8
}
That's pretty cool. You could do the same trick with any process that generates/sorts/filters keys for accessing data, giving you a common foundation for iteration across a suite of related classes.
Iteration stunts

Okay, so iterables can be used to access data, and to unwrap it from wrapper objects that we use to generate API facades around native JavaScript collections. We can also use them to loop over data that doesn't technically exist, like spreadsheet ranges and axis ticks. All of this is good and useful. But much like jQuery's each() now feels backwards because we have forEach() and for ... of on arrays, iterables give us new expressive syntax to make some old patterns obsolete.
For example, it used to be common for libraries that implemented their own collections to offer a toArray() method. When I use Cheerio for scraping, I often call selection.toArray() so that I can use the forms of map() and filter() that I'm accustomed to. But, of course, if you implement iteration, you don't really need a special method for converting into an array anymore, because we have the spread syntax.
// old-style, requires writing manual conversion code
var array = selection.toArray();

// new hotness
var array = [...selection];

// combining several collections together
var combined = [...iterableA, ...iterableB];
In fact, since destructuring is also based on iterables, you can use it for common operations like "getting the first unwrapped item." For example, when I work with legacy d3 scripts, I'll often see something like this:
var first = d3.select("li").node();
But since d3 implements the iteration protocol, you can just destructure it instead. You can even get all the remaining elements while you're at it!
var [ first, ...rest ] = d3.select("li");
Ah, you want the last item, instead. No problem.
var [ last ] = [...d3.select("li")].reverse();
Yeah, that one's a little baroque. Fun, though.
The spread operator doesn't just work for arrays, it also works for variadic functions like Math.max(). This opens up some nicely expressive syntax:
// using our axis example from above
var largestTick = Math.max(...axis);
var smallestTick = Math.min(...axis);
The built-in collections also understand this protocol, so you can use them as tools for manipulating your custom iterables. For example, let's say you have an iterable that might contain repeated values. You can pipe that through a Set to get rid of them:
var duplicated = [...fibonacci];
// 1, 1, 2, 3, 5, 8

var unique = [...new Set(fibonacci)]
// 1, 2, 3, 5, 8
Which reminds me of my favorite trick for creating a lookup table from a keyed iterable:
var data = [
  { name: "alice", data: 1 },
  { name: "bob", data: 2 },
  { name: "eve", data: 3 }
];

var lookup = Object.fromEntries(data.map(row => [row.name, row.data]));
// { alice: 1, bob: 2, eve: 3 }
Unfortunately, while the spread operator lets us go from an iterable to an array easily, you do have to implement your own method if you want to go from an array to your own custom iterable. However, if you stick to the protocol, it does make consumption a lot easier. For example, my spreadsheet class can be populated from an array (or any other iterable) with this code:
class Sheet {
  from(iterable) {
    var column = 1;
    var row = 1;
    for (var value of iterable) {
      this.cells.set(column + ":" + row, value);
      column++;
      // wrap around at the end of each row
      if (column > this.columns) {
        row++;
        column = 1;
      }
    }
  }
}
As long as you mimic Array.from() and Object.fromEntries(), it's pretty easy to make sure that your class will be able to pull in data from any other iterable object, and users will have a familiar model for how to interact with it.