-
-
Save ptaoussanis/e537bd8ffdc943bbbce7 to your computer and use it in GitHub Desktop.
(comment ; Fun with transducers, v2 | |
;; Still haven't found a brief + approachable overview of Clojure 1.7's new | |
;; transducers in the particular way I would have preferred myself - so here goes: | |
;;;; Definitions | |
;; Looking at the `reduce` docstring, we can define a 'reducing-fn' as: | |
(fn reducing-fn ([]) ([accumulation next-input])) -> new-accumulation | |
;; (The `[]` arity is actually optional; it's only used when calling | |
;; `reduce` w/o an init-accumulator). | |
;; We choose to define a 'transducing-fn' as: | |
(fn transducing-fn [reducing-fn]) -> new-reducing-fn | |
;; If you're familiar with Ring middleware, a transducer is a lot like | |
;; reducing-fn middleware: | |
(fn ring-handler [ring-req]) -> ring-resp | |
(fn ring-middleware [ring-handler]) -> new-ring-handler | |
;; Compare: | |
(fn reducing-fn ([]) ([accumulation next-input])) -> new-accumulation | |
(fn transducing-fn [reducing-fn]) -> new-reducing-fn | |
;;;; Quick observations: | |
;; 1. A transducer is just a fn. | |
;; 2. It's a lot like reducing-fn middleware, and composes just like middleware. | |
;; 3. This doesn't sound very impressive so far, which makes it easy to miss | |
;; how big of a deal transducers actually are. | |
;; In fact numerous, major benefits fall out of this simple definition. | |
;; Transducers (transducing-fns plus a few utils) give us: | |
;; * Performance: | |
;; * Can reduce w/o incurring any sequence construction costs. | |
;; * Can map composed operations w/o incurring >1 sequence construction cost. | |
;; * Efficient filtering + early termination. | |
;; * All the benefits of 'Reducers' (parallel fork-join, etc.). | |
;; | |
;; * Convenience: | |
;; * Easy composition through standard fn `comp` and threading, etc. | |
;; * Easy construction through single-arity `map`, `filter`, etc. | |
;; * Entirely eliminates need for special/extra core.async channel transform | |
;; fns (`map<`, etc.). | |
;; * Transducer-fns can use internal state to easily build powerful | |
;; operations (e.g. see `dedupe` source). | |
;; * Laziness iff you want it. | |
;; | |
;; * Conceptual: | |
;; * Elegantly & flexibly unifies concepts like: | |
;; * Reducing - (ordered-seq -> accumulated-val). | |
;; * Mapping - (ordered-seq -> new-ordered-seq). | |
;; * 'Reducers' - (unordered-coll -> accumulated-val). | |
;; * Laziness. | |
;; * Process composition. | |
;; * Transducers are just fns defined in a particular way and which can be | |
;; fed to some transducer utils. No voodoo, just a clever idea. | |
;;;; Transducer API | |
;;; The following all return transducing-fns (note absence of any colls): | |
(map f) | |
(filter f) | |
(remove f) | |
(take n) | |
(take-while pred) | |
(drop n) | |
(drop-while pred) | |
(take-nth n) | |
(replace smap) | |
(into to xform from) | |
(partition-by f) | |
(partition-all f1) | |
(keep f) | |
(keep-indexed f) | |
(flatmap f) ; new, like `mapcat` (deprecated) | |
(mapcat f) ; upcoming | |
(dedupe f) ; new | |
(random-sample prob) ; new | |
;; or you can just write your own (recall that transducing-fns are just fns) | |
;;; And these utils consume transducing-fns (note colls for consuming): | |
(into to xform from) | |
(iteration xform coll) ; new | |
(transduce xform f coll) ; new, like `reduce` | |
(transduce xform f init coll) ; new, like `reduce` | |
(sequence xform coll) ; like (lazy) single-coll `map` | |
(sequence xform & colls) ; like (lazy) multi-coll `map` | |
;;;; A minor wrinkle | |
;; Going back to our original definition: | |
(fn reducing-fn ([]) ([accumulation next-input])) -> new-accumulation | |
(fn transducing-fn [reducing-fn]) -> new-reducing-fn | |
;; I omitted a detail which Rich helped clarify. | |
;; `transduce` _actually_ expects a reducing-fn modified to also accept a new | |
;; `[accumumulation]` arity: | |
(fn transducer-ready-reducing-fn | |
([]) ; Recall that this arity is optional (only needed when no init-accumulator given) | |
([accumulation]) ; <- This is the new arity to support transducing | |
([accumulation next-input]) | |
) | |
;; Clojure 1.7-alpha1's `transduce` _automatically_ adds the extra arity | |
;; given a regular reducing-fn, but later versions will require that you take | |
;; care of this yourself (the extra flexibility is helpful, but something | |
;; outside the scope of this short overview). A utility called `completing` | |
;; is being added to Clojure >1.7-alpha1 which helps wrap a regular reducing-fn | |
;; to give it this extra arity. | |
;;;; Homework | |
;; This is the identity transducing-fn: | |
(defn identity-transducing-fn | |
[reducing-fn] ; A 'completed' reducing fn (i.e. with an `[accumulation]` arity) | |
(fn new-reducing-fn | |
([] (reducing-fn)) ; Only called/necessary when no init-accumulator given | |
([accumulation] (reducing-fn accumulation)) | |
([accumulation new-input] | |
(reducing-fn accumulation new-input)))) | |
(comment | |
(sequence identity-transducing-fn '(1 2 3 4)) ; -> '(1 2 3 4) | |
) | |
;; I found it helpful to compare this to the definition of the standard | |
;; transducing-fns. `(filter pred)` and `(dedupe)` are simple so good starting | |
;; points. Here's `(filter pred)`: | |
(defn filter-transducing-fn [pred] | |
(fn [reducing-fn] ; Like Ring middleware takes a handler | |
(fn new-reducing-fn ; Like Ring middleware returns a new-handler | |
([] (reducing-fn)) | |
([accumulation] (reducing-fn accumulation)) | |
([accumulation input] | |
(if (pred input) ; Only change from identity-transducing-fn | |
(reducing-fn accumulation input) | |
accumulation))))) | |
(comment | |
(sequence (filter-transducing-fn even?) '(1 2 3 4)) ; -> '(2 4) | |
) | |
;; Hope this was useful to someone. Corrections/comments welcome! | |
;; Happy hacking, cheers! :-) | |
;; Peter (https://github.com/ptaoussanis) | |
) |
@laczoka - I was wondering the same thing, but the volatile!
function within the new transducing functions seems to indicate that these are to be used in a single-threaded context only.
So, as far as I can tell, if you want parallelism, stick with reducers (specifically fold
).
To be quite honest, I'm not sure how transducers are much of a game-changer for me.
I was really excited for them, because when I discovered reducers, they really changed the way I coded. In fact, given that reducers are usually faster than lazy core collections functions, especially on large collections, I refactored my code base to use almost exclusively the reducers functions when operating on collections (plus a few others I've gleaned from various sources such as a reducers version of range
, which is significantly faster than its lazy counterpart), unless I explicitly want laziness.
There are three reasons that stand between me and using transducers:
-
Given cursory benchmarks using Criterium, the difference in performance between reducers and transducers seems to be negligible. (Correct me if I'm wrong.) This is even taking into account the
volatile
s used instead ofatom
s. -
It seems to be the case that transducers are, by definition, for single-threaded use only. Even for this reason alone I would never use them, because parallelization is a must-have for me; it's sped up my code by a factor of at least 2, and oftentimes around 5 (it depends on the number of cores I'm using, the size of the collections operated on, and a number of other factors). In my understanding, that was essentially the reason that the reducers library was created; transducers would undo that change.
-
I appreciate the fact that function composition seems to be central to the transducers paradigm and I do see the elegance in it, but that's because I already use it everywhere. I just want to know - what is the difference between:
(def xform1 (fn->> (r/map inc) (r/filter even?)))
(def result1 (->> [1 2 3 4 5] xform1 (into []))
_(Here (fn->> ...)
is equivalent to #(->> % ...)
)_
and
(def xform2 (comp (map inc) (filter even?))
(def result2 (into [] xform2 [1 2 3 4 5]))
?
I know that transducers can be used lazily via the sequence
function, which is quite useful and which the reducers library doesn't include (although Christophe Grand posted a function to output a lazy sequence from a reducer, which I occasionally use), and they integrate core.async's collection functions into an overall unified core collections abstraction. By comparison, the reducers library isn't integrated all that well into the core library. But then again, I rarely use the core functions anymore anyway unless I want laziness.
In short, I wish transducers, reducers, and the core collections functions were unified under one abstraction to be able to choose among processing the collection lazily (via sequence
, etc.), outputting all elements into a specific type of collection (vector
, hash-map
, etc.), and/or processing a collection in parallel by being able to use an equivalent of reducers/fold
.
awesome guide! maybe include some core.async info, like that chan
and pipeline
have slots for a transducer, and pipeline
can parallelize the xforms?
shouldn't line-68 (into to xform from) be removed from the list of sexps that return a transducer? A very good recap .. thx!
Thanks very much for this page. iteration
on line 81 doesn't seem to exist. I can't find it in 1.7.0, 1.8.0, or 1.9.0-alpha12. Was this something planned that hasn't yet appeared?
into
doesn't return an xform on line 68; https://gist.github.com/ptaoussanis/e537bd8ffdc943bbbce7#file-transducers-clj-L68
What's not clear to me, how does a stateful transducer stack play with fork-join, any comments on that?