-
-
Save ptaoussanis/e537bd8ffdc943bbbce7 to your computer and use it in GitHub Desktop.
(comment ; Fun with transducers, v2 | |
;; Still haven't found a brief + approachable overview of Clojure 1.7's new | |
;; transducers in the particular way I would have preferred myself - so here goes: | |
;;;; Definitions | |
;; Looking at the `reduce` docstring, we can define a 'reducing-fn' as: | |
(fn reducing-fn ([]) ([accumulation next-input])) -> new-accumulation | |
;; (The `[]` arity is actually optional; it's only used when calling | |
;; `reduce` w/o an init-accumulator). | |
;; We choose to define a 'transducing-fn' as: | |
(fn transducing-fn [reducing-fn]) -> new-reducing-fn | |
;; If you're familiar with Ring middleware, a transducer is a lot like | |
;; reducing-fn middleware: | |
(fn ring-handler [ring-req]) -> ring-resp | |
(fn ring-middleware [ring-handler]) -> new-ring-handler | |
;; Compare: | |
(fn reducing-fn ([]) ([accumulation next-input])) -> new-accumulation | |
(fn transducing-fn [reducing-fn]) -> new-reducing-fn | |
;;;; Quick observations: | |
;; 1. A transducer is just a fn. | |
;; 2. It's a lot like reducing-fn middleware, and composes just like middleware. | |
;; 3. This doesn't sound very impressive so far, which makes it easy to miss | |
;; how big of a deal transducers actually are. | |
;; In fact numerous, major benefits fall out of this simple definition. | |
;; Transducers (transducing-fns plus a few utils) give us: | |
;; * Performance: | |
;; * Can reduce w/o incurring any sequence construction costs. | |
;; * Can map composed operations w/o incurring >1 sequence construction cost. | |
;; * Efficient filtering + early termination. | |
;; * All the benefits of 'Reducers' (parallel fork-join, etc.). | |
;; | |
;; * Convenience: | |
;; * Easy composition through standard fn `comp` and threading, etc. | |
;; * Easy construction through single-arity `map`, `filter`, etc. | |
;; * Entirely eliminates need for special/extra core.async channel transform | |
;; fns (`map<`, etc.). | |
;; * Transducer-fns can use internal state to easily build powerful | |
;; operations (e.g. see `dedupe` source). | |
;; * Laziness iff you want it. | |
;; | |
;; * Conceptual: | |
;; * Elegantly & flexibly unifies concepts like: | |
;; * Reducing - (ordered-seq -> accumulated-val). | |
;; * Mapping - (ordered-seq -> new-ordered-seq). | |
;; * 'Reducers' - (unordered-coll -> accumulated-val). | |
;; * Laziness. | |
;; * Process composition. | |
;; * Transducers are just fns defined in a particular way and which can be | |
;; fed to some transducer utils. No voodoo, just a clever idea. | |
;;;; Transducer API | |
;;; The following all return transducing-fns (note absence of any colls): | |
(map f) | |
(filter f) | |
(remove f) | |
(take n) | |
(take-while pred) | |
(drop n) | |
(drop-while pred) | |
(take-nth n) | |
(replace smap) | |
(into to xform from) | |
(partition-by f) | |
(partition-all f1) | |
(keep f) | |
(keep-indexed f) | |
(flatmap f) ; new, like `mapcat` (deprecated) | |
(mapcat f) ; upcoming | |
(dedupe f) ; new | |
(random-sample prob) ; new | |
;; or you can just write your own (recall that transducing-fns are just fns) | |
;;; And these utils consume transducing-fns (note colls for consuming): | |
(into to xform from) | |
(iteration xform coll) ; new | |
(transduce xform f coll) ; new, like `reduce` | |
(transduce xform f init coll) ; new, like `reduce` | |
(sequence xform coll) ; like (lazy) single-coll `map` | |
(sequence xform & colls) ; like (lazy) multi-coll `map` | |
;;;; A minor wrinkle | |
;; Going back to our original definition: | |
(fn reducing-fn ([]) ([accumulation next-input])) -> new-accumulation | |
(fn transducing-fn [reducing-fn]) -> new-reducing-fn | |
;; I omitted a detail which Rich helped clarify. | |
;; `transduce` _actually_ expects a reducing-fn modified to also accept a new | |
;; `[accumumulation]` arity: | |
(fn transducer-ready-reducing-fn | |
([]) ; Recall that this arity is optional (only needed when no init-accumulator given) | |
([accumulation]) ; <- This is the new arity to support transducing | |
([accumulation next-input]) | |
) | |
;; Clojure 1.7-alpha1's `transduce` _automatically_ adds the extra arity | |
;; given a regular reducing-fn, but later versions will require that you take | |
;; care of this yourself (the extra flexibility is helpful, but something | |
;; outside the scope of this short overview). A utility called `completing` | |
;; is being added to Clojure >1.7-alpha1 which helps wrap a regular reducing-fn | |
;; to give it this extra arity. | |
;;;; Homework | |
;; This is the identity transducing-fn: | |
(defn identity-transducing-fn | |
[reducing-fn] ; A 'completed' reducing fn (i.e. with an `[accumulation]` arity) | |
(fn new-reducing-fn | |
([] (reducing-fn)) ; Only called/necessary when no init-accumulator given | |
([accumulation] (reducing-fn accumulation)) | |
([accumulation new-input] | |
(reducing-fn accumulation new-input)))) | |
(comment | |
(sequence identity-transducing-fn '(1 2 3 4)) ; -> '(1 2 3 4) | |
) | |
;; I found it helpful to compare this to the definition of the standard | |
;; transducing-fns. `(filter pred)` and `(dedupe)` are simple so good starting | |
;; points. Here's `(filter pred)`: | |
(defn filter-transducing-fn [pred] | |
(fn [reducing-fn] ; Like Ring middleware takes a handler | |
(fn new-reducing-fn ; Like Ring middleware returns a new-handler | |
([] (reducing-fn)) | |
([accumulation] (reducing-fn accumulation)) | |
([accumulation input] | |
(if (pred input) ; Only change from identity-transducing-fn | |
(reducing-fn accumulation input) | |
accumulation))))) | |
(comment | |
(sequence (filter-transducing-fn even?) '(1 2 3 4)) ; -> '(2 4) | |
) | |
;; Hope this was useful to someone. Corrections/comments welcome! | |
;; Happy hacking, cheers! :-) | |
;; Peter (https://github.com/ptaoussanis) | |
) |
Hah—that it does!
The way reduce
works is not quite a reducing function in the transducer sense. The one-arg arity of a reducing function is a "finalizer", which "finishes" the collection created by the zero-arg version (e.g. if the zero-arg returns (transient [])
, the one-arg returns (persistent! acc)
). Notice that transducer versions of filter, map, etc all apply the reducing function to the accumulator in their one-arity version.
However before this commit the various functions (and their docs) were inconsistent, and transduce
would call the one-arg arity to finalize regardless of whether init
was explicitly specified or not (a bug).
There are actually three things:
- "reducing step functions" with three arities:
(reduce-step-fn ([] "-> empty-accumulator") ([accumulation] "-> finished-accumulator") ([accumulation next-input] "-> accumulator with next-input added."))
- transducer functions:
(transducer-fn [reduce-step-fn] "-> New reduce-step function")
- reducing functions: the utility functions which actually "run" the reducing-step-functions. They take a source collection and return the accumulated result of a reducing-step function reduced over the source. The confusion comes from the fact that these utility functions can take an initial accumulator, too; but this is the same as replacing the one and two arg arities of the supplied reducing function with
([] init)
and([acc] acc)
.
The rule of thumb is (or should be), that if a reducing function receives an argument to receive the result:
- The zero- and one-arg arities of the reducing-step-function are not used.
- the reducing function must return a collection of the same "type" (i.e. no special finalization).
This is how the line "If coll has only 1 item, it is returned and f is not called." in the documentation for reduce
should be understood: reduce
always has a specified accumulator (either val
or the first item in the source coll
) and never uses the reducing-step-function's zero- and one-arg arities. A reducing function is not the same as a reducing-step function.
@richhickey: Hi Rich, that does help - thank you! And this is awesome stuff, truly. Much, much appreciated.
@ptaoussanis You do need to call reducer-fn on line 77 as @halgari suggests. If you try to apply your identity transducer as listed to e.g. a partition-by transducer, I think you'll see why:
user> (defn identity-transducer [reducer-fn]
(fn new-reducer-fn
;; Remember reducer-fns need 3 arities:
([] (reducer-fn))
([accumulation] accumulation)
([accumulation new-input]
(reducer-fn accumulation new-input))))
#'user/identity-transducer
user> (def t1 (partition-by #(int (/ % 3))))
#'user/t1
user> (sequence t1 (range 10))
([0 1 2] [3 4 5] [6 7 8] [9])
user> (sequence (comp identity-transducer t1) (range 10))
([0 1 2] [3 4 5] [6 7 8])
;; Last collection of [9] is missing!
;; Composition order is important; (comp t1 identity-transducer) won't provoke this problem!
user>
Okay, have just updated to reflect Rich & others' clarification and hopefully fix the confusion around the [accumulation]
-arity reducing-fn.
Thanks to everyone for their input! Please feel free to keep the comments/corrections coming. Cheers :-)
Note that transduce (or any transduction) does not require arity-0 support of the bottom reducing fn unless you use it in a context without an init value.
@richhickey Updated, thanks Rich!
Thanks for writing this. Mind taking a look at my notes on transducers? https://gist.github.com/runexec/06b56a9dbd15e43145b9
What's not clear to me, how does a stateful transducer stack play with fork-join, any comments on that?
@laczoka - I was wondering the same thing, but the volatile!
function within the new transducing functions seems to indicate that these are to be used in a single-threaded context only.
So, as far as I can tell, if you want parallelism, stick with reducers (specifically fold
).
To be quite honest, I'm not sure how transducers are much of a game-changer for me.
I was really excited for them, because when I discovered reducers, they really changed the way I coded. In fact, given that reducers are usually faster than lazy core collections functions, especially on large collections, I refactored my code base to use almost exclusively the reducers functions when operating on collections (plus a few others I've gleaned from various sources such as a reducers version of range
, which is significantly faster than its lazy counterpart), unless I explicitly want laziness.
There are three reasons that stand between me and using transducers:
-
Given cursory benchmarks using Criterium, the difference in performance between reducers and transducers seems to be negligible. (Correct me if I'm wrong.) This is even taking into account the
volatile
s used instead ofatom
s. -
It seems to be the case that transducers are, by definition, for single-threaded use only. Even for this reason alone I would never use them, because parallelization is a must-have for me; it's sped up my code by a factor of at least 2, and oftentimes around 5 (it depends on the number of cores I'm using, the size of the collections operated on, and a number of other factors). In my understanding, that was essentially the reason that the reducers library was created; transducers would undo that change.
-
I appreciate the fact that function composition seems to be central to the transducers paradigm and I do see the elegance in it, but that's because I already use it everywhere. I just want to know - what is the difference between:
(def xform1 (fn->> (r/map inc) (r/filter even?)))
(def result1 (->> [1 2 3 4 5] xform1 (into []))
_(Here (fn->> ...)
is equivalent to #(->> % ...)
)_
and
(def xform2 (comp (map inc) (filter even?))
(def result2 (into [] xform2 [1 2 3 4 5]))
?
I know that transducers can be used lazily via the sequence
function, which is quite useful and which the reducers library doesn't include (although Christophe Grand posted a function to output a lazy sequence from a reducer, which I occasionally use), and they integrate core.async's collection functions into an overall unified core collections abstraction. By comparison, the reducers library isn't integrated all that well into the core library. But then again, I rarely use the core functions anymore anyway unless I want laziness.
In short, I wish transducers, reducers, and the core collections functions were unified under one abstraction to be able to choose among processing the collection lazily (via sequence
, etc.), outputting all elements into a specific type of collection (vector
, hash-map
, etc.), and/or processing a collection in parallel by being able to use an equivalent of reducers/fold
.
awesome guide! maybe include some core.async info, like that chan
and pipeline
have slots for a transducer, and pipeline
can parallelize the xforms?
shouldn't line-68 (into to xform from) be removed from the list of sexps that return a transducer? A very good recap .. thx!
Thanks very much for this page. iteration
on line 81 doesn't seem to exist. I can't find it in 1.7.0, 1.8.0, or 1.9.0-alpha12. Was this something planned that hasn't yet appeared?
into
doesn't return an xform on line 68; https://gist.github.com/ptaoussanis/e537bd8ffdc943bbbce7#file-transducers-clj-L68
clojure/clojure@7d84a9f should clarify some things