Skip to content

Instantly share code, notes, and snippets.

@getify
Last active February 27, 2023 00:23
Show Gist options
  • Star 15 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save getify/52b627bd26228b853df9852e3fe1ae65 to your computer and use it in GitHub Desktop.
Save getify/52b627bd26228b853df9852e3fe1ae65 to your computer and use it in GitHub Desktop.
Comparing: array method chaining, generator delegations, and transducing

Comparing: array method chaining, generator delegations, and transducing

I'm writing this quick post to respond to a recent twitter conversation where claims were made about the merits (or lack thereof) of transducers, as they relate to composing list comprehensions (map, filter).

For comparison sake throughout the rest of my post, below you'll find three (actually four!) implementations of a simple list operation demo:

TL;DR

It's well-tread territory that the typical array method chaining approach common in JS, like list.map(..).map(..).filter(..), has two specific downsides:

  1. Each method in the chain iterates the entire list each time, so you end up traversing the list more frequently the longer the chain of methods being composed. This wastes CPU cycles.

  2. Each method creates a new array, with all the intermediate arrays then being thrown away (GC'd). This wastes memory (and CPU cycles to clean up with GC).

One claim in the thread was that transducers weren't "actually useful", apparently dismissing these two downsides. An adjoiner claim in reply was that generators/iterators "solve both problems" of array-method-chaining, thereby making transducers unnecessary. Yet another reply supported the idea of generators being sufficient/preferred.

It's true that the generators/iterators approach is more efficient than the base array method chaining approach. It's also nice that this approach can be used lazily, meaning it can work even on an "infinite list" of values.

It's also true that the transducers approach is less ergonomic/familiar (at least in JS), which makes this approach less attractive to most JS developers.

That said, the claims cited above in the twitter thread miss important inefficiencies in the generators/iterators approach that transducers actually do eliminate. Transducers indeed may be worth the extra cognitive weight required, especially as the "list" of values, and/or the number of composed operations, grows.

Array Method Chaining

Here's a base example (the same as below the post):

var f = list => (
	list
	.map(v => v + 1)
	.map(v => v * 2)
	.map(v => v + 3)
	.filter(v => v % 3 == 0)
	.map(v => "v: " + v)
);

f([0,1,2,3,4,5,6,7,8,9]);
// [ "v: 9", "v: 15", "v: 21" ]

The chain of .map(..) and .filter(..) calls is extremely common/familiar style in JS, so called "fluent style" or simply "method chaining". It was popularized by jQuery 15 years ago.

But there are two behaviors inherent to this approach: (1) each method call iterates the "whole list" (2) each method call produces a new list.

In fairness, these behaviors are typically not considered a big concern for performance. There's CPU waste (1, 2) and memory waste (2)... but as usual, context is very important in deciding how much of a performance concern (or not) it should be.

If the list of values being processed is only 10 items in length, the CPU and memory waste are trivial. If the list has 10,000 items or 1,000,000 items, the CPU and memory waste could start to add up.

If the number of operations being composed is a few, even 5 as above, the multiplier on the CPU and memory waste is hardly worth worrying about. But if the number of operations was 15+ steps in length -- completely feasible if you consider the jQuery style programming that set this precedent! -- the waste starts to add up, especially if the list is medium length or longer.

So, we cannot say objectively that this is an issue that should be addressed, because... as always, it depends. It depends on list length and on how many processing operations you need to compose.

If every usage you have of this pattern is always on short lists, with never more than one or a few processing operations, you can probably safely dismiss the rest of this post.

If not, or you're not sure, or you're curious... continue onward.

Aside: Alternative Style, Same Problems

There's an alternative style that you may occasionally encounter:

var f = pipe(
	map(v => v + 1),
	map(v => v * 2),
	map(v => v + 3),
	filter(v => v % 3 == 0),
	map(v => "v: " + v)
);

f([0,1,2,3,4,5,6,7,8,9]);
// [ "v: 9", "v: 15", "v: 21" ]

Note: Please check out the listing below for more of the implementation details here, including how pipe(..), map(..), and filter(..) are written.

This style looks more FP'ish, and indeed embodies some FP principles such as pipe/composition, curried functions, and point-free style.

But I only include this implementation variation to point out: under the covers, the method chaining is still happening. If you don't believe me, go take a look.

In any case, the performance profile (the amount of waste) of this approach will be approximately the same as the previous explicit method chaining style.

OK, with that out of the way, let's look at generators/iterators.

Generator/Iterator Delegations

Disclaimer: This post is not going to teach you anything about generators/iterators. That's a huge topic, and it's been covered in-depth many times before. Just google it. Or check out "The Basics of ES6 Generators" (parts 1 and 2, specifically) by me, and then the fantastic "Why would anyone need JavaScript Generator Functions?" by James Sinclair.

OK, so now we're talking about generators/iterators. Again, here's the illustration of this style of code for our running example:

var f = pipe(
	mapG(v => v + 1),
	mapG(v => v * 2),
	mapG(v => v + 3),
	filterG(v => v % 3 == 0),
	mapG(v => "v: " + v),
	consumeAll
);

f([0,1,2,3,4,5,6,7,8,9]);
// [ "v: 9", "v: 15", "v: 21" ]

Note: Again, please check out the listing below for more of the implementation details here, including how pipe(..), mapG(..), filterG(..), and consumeAll(..) are written.

First, this style looks quite similar to the previous (alternative) array-method-composition style. That's great!

Yet, as was claimed in the original motivating twitter thread, it is true that the generators (and in particular, the yield delegation) are allowing this example to run differently than the previous array-method-chaining approach.

Here's a quick glimpse at part of the implementation details:

function *map(fn,it){
	for (let v of it) {
		yield fn(v);
	}
}

function *filter(fn,it){
	for (let v of it) {
		if (fn(v)) yield v;
	}
}

Because iterators are lazy, we are not creating a whole new copy of the list for each step in the processing. Each generator yield delegates to the next, so the value sort of pipelines through all steps and out into the final iterator -- which we then fully consume with the ... and drop it into a single array.

That also means we only iterate the original 10 item list once, rather than 5 times.

Great! The code looks almost the same, but doesn't have the downsides. So... problem solved?!? Eh. Not quite.

While generators/iterators are super powerful, and they indeed provide a graceful looking solution to our list processing, they're not "zero cost" abstractions as might be implied/assumed.

That is, even though we're not iterating the whole list multiple times, and even though we're not creating and throwing away multiple intermediate lists at each step... we're still keeping a context alive (in memory), which grows for each step in the operation.

Here's what I mean: effectively, each value is processed kinda like this: yield .. yield .. yield .. yield .. yield ... That is, we're creating (and keeping alive) a generator instance for each step, and passing values down the pipeline of these generator instances.

Regardless of how familiar you are with the nuances of generators/iterators, it's probably not surprising when I tell you that a yield delegation is a fair bit more expensive than a single function call. There's a lot going on under the covers to make generators and the iterator protocol work.

In my estimation -- disclaimer: I have not actually studied the JS engine code in-depth! -- each yield probably costs about 2-3 function calls, plus a bit more CPU/memory in terms of extra data representation. These underlying function calls are also extremely unlikely to be "proper tail calls". That means they grow the call-stack, even if that's not visible to the JS layer's debugging tools.

Keep in mind, whatever that extra is, it's a multiplier on top of the length of the "list" of values, as well as the number of steps. It's a perfect example of an abstraction that really well hides quite a bit of complexity (and thus cost).

Moreover, after the processing is completed, on top of each generator instance (basically a function call), each iterator instance -- the mechanism that connects each step -- is extra memory that must be thrown away for GC.

The code abstraction of generators/iterators is really nice, but... it is absolutely not the same as a simple function call composition. There is cost to consider here.

So how could we get closer to a "zero cost" abstraction, where it's just a function call (hint: even a proper tail call!) from each step to another?

The answer is transducers.

Transducing

Disclaimer: Again, this post is not going to teach you anything about transducers. That's a huge topic, and it's been covered in-depth many times before. Just google it. Or check out "Functional-Light JavaScript", Appendix A by me.

Finally, we get to this strangely named transducers. Here again is our running example, using this style:

var f = transduceL(
  compose(
    mapT(v => v + 1),
    mapT(v => v * 2),
    mapT(v => v + 3),
    filterT(v => v % 3 == 0),
    mapT(v => "v: " + v)
  )
);

f([0,1,2,3,4,5,6,7,8,9]);
// [ "v: 9", "v: 15", "v: 21" ]

Note: Yet again, please check out the listing below for more of the implementation details here, including how compose(..), mapT(..), filterT(..), and transduceL(..) are written.

Yes, the implementation details are a bit more involved here. But once you have the utilities (e.g., compose(), mapT(), etc), which are almost always provided by a FP/transducers library, just look at that code snippet as it sits. It's almost the same style and ergonomics, isn't it!?

But what's happening under the covers boils down to something much closer to "zero cost" abstraction.

Let's peek at one part of the implementation details:

function map(mapperFn,combinerFn){
	return (list,v) => combinerFn(list,mapperFn(v));
}

Here's what you should look at: the combinerFn(..) part is how each step of processing connects (composes with) the next step. And what do you know!?!? It's a function call, that's it.

Moreover, it's a proper tail call. That means that at least some JS engines (JSC in Safari, as of now) will (if in strict mode) actually dispatch each function call without paying the CPU or memory cost of a new stack frame and just reusing the same one.

So each piece of data will be processed sorta like: combinerFn( combinerFn( ... ) ). That's not actually how it is, but it's conceptually similar.

That's much less overhead cost from step to step, compared to the generator/iterator mechanism we looked at earlier.

Summary

Is the extra cognitive overhead of transducers worth the savings of processing overhead compared to generators?

I can't give an objective answer to that. But I think it should now be clear that, the longer the "list" of values to process, and the more steps to process each value with, the more these CPU/memory costs will add up.

These costs ramp up significantly when you move these ideas from a single discrete list of values (an array) to processing a stream of values coming through over time. Transducers really shine in that context.

We shouldn't be so quick to dismiss transducers as being academic trivia without practical use in JS.

/* ARRAY METHOD CHAINING */
var f = list => (
list
.map(v => v + 1)
.map(v => v * 2)
.map(v => v + 3)
.filter(v => v % 3 == 0)
.map(v => "v: " + v)
);
f([0,1,2,3,4,5,6,7,8,9]);
// [ "v: 9", "v: 15", "v: 21" ]
/* ARRAY METHOD COMPOSITION */
function curry(util) {
return fn => list => util(fn,list);
}
function pipe(...fns) {
return v => fns.reduce(
(v,fn) => fn(v),
v
);
}
var map = curry(function map(fn,list){
return list.map(fn);
});
var filter = curry(function filter(fn,list){
return list.filter(fn);
});
// ****************************
var f = pipe(
map(v => v + 1),
map(v => v * 2),
map(v => v + 3),
filter(v => v % 3 == 0),
map(v => "v: " + v)
);
f([0,1,2,3,4,5,6,7,8,9]);
// [ "v: 9", "v: 15", "v: 21" ]
/* GENERATORS/ITERATORS */
function curryGen(gen) {
return fn => (
function*(it){
yield* gen(fn,it);
}
);
}
function pipe(...fns) {
return v => fns.reduce(
(v,fn) => fn(v),
v
);
}
function consumeAll(it) {
return [...it];
}
var mapG = curryGen(function *map(fn,it){
for (let v of it) {
yield fn(v);
}
});
var filterG = curryGen(function *filter(fn,it){
for (let v of it) {
if (fn(v)) yield v;
}
});
var f = pipe(
mapG(v => v + 1),
mapG(v => v * 2),
mapG(v => v + 3),
filterG(v => v % 3 == 0),
mapG(v => "v: " + v),
consumeAll
);
f([0,1,2,3,4,5,6,7,8,9]);
// [ "v: 9", "v: 15", "v: 21" ]
/* TRANSDUCERS */
function curry(fn) {
return x => y => fn(x,y);
}
function compose(...fns) {
return v => fns.reduceRight(
(v,fn) => fn(v),
v
);
}
function concatL(list,v) {
return list.concat(v);
}
var mapT = curry(function map(mapperFn,combinerFn){
return (list,v) => combinerFn(list,mapperFn(v));
});
var filterT = curry(function filter(predicateFn,combinerFn){
return (list,v) => {
if (predicateFn(v)) return combinerFn(list,v);
return list;
};
});
var transduceL = curry(function reduce(fn,list){
return list.reduce(fn(concatL),[]);
});
// ****************************
var f = transduceL(
compose(
mapT(v => v + 1),
mapT(v => v * 2),
mapT(v => v + 3),
filterT(v => v % 3 == 0),
mapT(v => "v: " + v)
)
);
f([0,1,2,3,4,5,6,7,8,9]);
// [ "v: 9", "v: 15", "v: 21" ]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment