(comment ; Fun with transducers, v2 | |
;; Still haven't found a brief + approachable overview of Clojure 1.7's new | |
;; transducers in the particular way I would have preferred myself - so here goes: | |
;;;; Definitions | |
;; Looking at the `reduce` docstring, we can define a 'reducing-fn' as: | |
(fn reducing-fn ([]) ([accumulation next-input])) -> new-accumulation | |
;; (The `[]` arity is actually optional; it's only used when calling | |
;; `reduce` w/o an init-accumulator). | |
;; We choose to define a 'transducing-fn' as: | |
(fn transducing-fn [reducing-fn]) -> new-reducing-fn | |
;; If you're familiar with Ring middleware, a transducer is a lot like | |
;; reducing-fn middleware: | |
(fn ring-handler [ring-req]) -> ring-resp | |
(fn ring-middleware [ring-handler]) -> new-ring-handler | |
;; Compare: | |
(fn reducing-fn ([]) ([accumulation next-input])) -> new-accumulation | |
(fn transducing-fn [reducing-fn]) -> new-reducing-fn | |
;;;; Quick observations: | |
;; 1. A transducer is just a fn. | |
;; 2. It's a lot like reducing-fn middleware, and composes just like middleware. | |
;; 3. This doesn't sound very impressive so far, which makes it easy to miss | |
;; how big of a deal transducers actually are. | |
;; In fact numerous, major benefits fall out of this simple definition. | |
;; Transducers (transducing-fns plus a few utils) give us: | |
;; * Performance: | |
;; * Can reduce w/o incurring any sequence construction costs. | |
;; * Can map composed operations w/o incurring >1 sequence construction cost. | |
;; * Efficient filtering + early termination. | |
;; * All the benefits of 'Reducers' (parallel fork-join, etc.). | |
;; | |
;; * Convenience: | |
;; * Easy composition through standard fn `comp` and threading, etc. | |
;; * Easy construction through single-arity `map`, `filter`, etc. | |
;; * Entirely eliminates need for special/extra core.async channel transform | |
;; fns (`map<`, etc.). | |
;; * Transducer-fns can use internal state to easily build powerful | |
;; operations (e.g. see `dedupe` source). | |
;; * Laziness iff you want it. | |
;; | |
;; * Conceptual: | |
;; * Elegantly & flexibly unifies concepts like: | |
;; * Reducing - (ordered-seq -> accumulated-val). | |
;; * Mapping - (ordered-seq -> new-ordered-seq). | |
;; * 'Reducers' - (unordered-coll -> accumulated-val). | |
;; * Laziness. | |
;; * Process composition. | |
;; * Transducers are just fns defined in a particular way and which can be | |
;; fed to some transducer utils. No voodoo, just a clever idea. | |
;;;; Transducer API | |
;;; The following all return transducing-fns (note absence of any colls): | |
(map f) | |
(filter f) | |
(remove f) | |
(take n) | |
(take-while pred) | |
(drop n) | |
(drop-while pred) | |
(take-nth n) | |
(replace smap) | |
(into to xform from) | |
(partition-by f) | |
(partition-all f1) | |
(keep f) | |
(keep-indexed f) | |
(flatmap f) ; new, like `mapcat` (deprecated) | |
(mapcat f) ; upcoming | |
(dedupe f) ; new | |
(random-sample prob) ; new | |
;; or you can just write your own (recall that transducing-fns are just fns) | |
;;; And these utils consume transducing-fns (note colls for consuming): | |
(into to xform from) | |
(iteration xform coll) ; new | |
(transduce xform f coll) ; new, like `reduce` | |
(transduce xform f init coll) ; new, like `reduce` | |
(sequence xform coll) ; like (lazy) single-coll `map` | |
(sequence xform & colls) ; like (lazy) multi-coll `map` | |
;;;; A minor wrinkle | |
;; Going back to our original definition: | |
(fn reducing-fn ([]) ([accumulation next-input])) -> new-accumulation | |
(fn transducing-fn [reducing-fn]) -> new-reducing-fn | |
;; I omitted a detail which Rich helped clarify. | |
;; `transduce` _actually_ expects a reducing-fn modified to also accept a new | |
;; `[accumumulation]` arity: | |
(fn transducer-ready-reducing-fn | |
([]) ; Recall that this arity is optional (only needed when no init-accumulator given) | |
([accumulation]) ; <- This is the new arity to support transducing | |
([accumulation next-input]) | |
) | |
;; Clojure 1.7-alpha1's `transduce` _automatically_ adds the extra arity | |
;; given a regular reducing-fn, but later versions will require that you take | |
;; care of this yourself (the extra flexibility is helpful, but something | |
;; outside the scope of this short overview). A utility called `completing` | |
;; is being added to Clojure >1.7-alpha1 which helps wrap a regular reducing-fn | |
;; to give it this extra arity. | |
;;;; Homework | |
;; This is the identity transducing-fn: | |
(defn identity-transducing-fn | |
[reducing-fn] ; A 'completed' reducing fn (i.e. with an `[accumulation]` arity) | |
(fn new-reducing-fn | |
([] (reducing-fn)) ; Only called/necessary when no init-accumulator given | |
([accumulation] (reducing-fn accumulation)) | |
([accumulation new-input] | |
(reducing-fn accumulation new-input)))) | |
(comment | |
(sequence identity-transducing-fn '(1 2 3 4)) ; -> '(1 2 3 4) | |
) | |
;; I found it helpful to compare this to the definition of the standard | |
;; transducing-fns. `(filter pred)` and `(dedupe)` are simple so good starting | |
;; points. Here's `(filter pred)`: | |
(defn filter-transducing-fn [pred] | |
(fn [reducing-fn] ; Like Ring middleware takes a handler | |
(fn new-reducing-fn ; Like Ring middleware returns a new-handler | |
([] (reducing-fn)) | |
([accumulation] (reducing-fn accumulation)) | |
([accumulation input] | |
(if (pred input) ; Only change from identity-transducing-fn | |
(reducing-fn accumulation input) | |
accumulation))))) | |
(comment | |
(sequence (filter-transducing-fn even?) '(1 2 3 4)) ; -> '(2 4) | |
) | |
;; Hope this was useful to someone. Corrections/comments welcome! | |
;; Happy hacking, cheers! :-) | |
;; Peter (https://github.com/ptaoussanis) | |
) |
This comment has been minimized.
This comment has been minimized.
@halgari: Hi Timothy, I'm going by the definition of a reducing fn, as per clojure.core's "f should be a function of 2 arguments. If val is not supplied,
returns the result of applying f to the first 2 items in coll, then
applying f to that result and the 3rd item, etc. If coll contains no
items, f must accept no arguments as well, and reduce returns the
result of calling f with no arguments. If coll has only 1 item, it
is returned and f is not called. If val is supplied, returns the
result of applying f to val and the first item in coll, then
applying f to that result and the 2nd item, etc. If coll contains no
items, returns val and f is not called." Specifically: "If coll has only 1 item, it is returned and f is not called." I may be misinterpreting; can you maybe think of an example where the current definition would trip up? Thanks! :-) |
This comment has been minimized.
This comment has been minimized.
Shouldn't it be passed a valid (defn identity-transducer [reducer-fn] reducer-fn) That'd be the most convenient given that transducer composition is supposed to just be ("reverse") function composition anyway—the identity should be the function identity. |
This comment has been minimized.
This comment has been minimized.
Also |
This comment has been minimized.
This comment has been minimized.
Also also |
This comment has been minimized.
This comment has been minimized.
@tel Hi Joseph,
That'd be another way, but maybe less enlightening ;-)
It's listed twice, once as
They're the same logical thing, but I'm guessing Does that make sense? |
This comment has been minimized.
This comment has been minimized.
Ah, I see where mapcat would have caused issue. I'll be a sore loser and suggest my definition of |
This comment has been minimized.
This comment has been minimized.
clojure/clojure@7d84a9f should clarify some things |
This comment has been minimized.
This comment has been minimized.
Hah—that it does! |
This comment has been minimized.
This comment has been minimized.
The way However before this commit the various functions (and their docs) were inconsistent, and There are actually three things:
The rule of thumb is (or should be), that if a reducing function receives an argument to receive the result:
This is how the line "If coll has only 1 item, it is returned and f is not called." in the documentation for |
This comment has been minimized.
This comment has been minimized.
@richhickey: Hi Rich, that does help - thank you! And this is awesome stuff, truly. Much, much appreciated. |
This comment has been minimized.
This comment has been minimized.
@ptaoussanis You do need to call reducer-fn on line 77 as @halgari suggests. If you try to apply your identity transducer as listed to e.g. a partition-by transducer, I think you'll see why: user> (defn identity-transducer [reducer-fn]
(fn new-reducer-fn
;; Remember reducer-fns need 3 arities:
([] (reducer-fn))
([accumulation] accumulation)
([accumulation new-input]
(reducer-fn accumulation new-input))))
#'user/identity-transducer
user> (def t1 (partition-by #(int (/ % 3))))
#'user/t1
user> (sequence t1 (range 10))
([0 1 2] [3 4 5] [6 7 8] [9])
user> (sequence (comp identity-transducer t1) (range 10))
([0 1 2] [3 4 5] [6 7 8])
;; Last collection of [9] is missing!
;; Composition order is important; (comp t1 identity-transducer) won't provoke this problem!
user> |
This comment has been minimized.
This comment has been minimized.
Okay, have just updated to reflect Rich & others' clarification and hopefully fix the confusion around the Thanks to everyone for their input! Please feel free to keep the comments/corrections coming. Cheers :-) |
This comment has been minimized.
This comment has been minimized.
Note that transduce (or any transduction) does not require arity-0 support of the bottom reducing fn unless you use it in a context without an init value. |
This comment has been minimized.
This comment has been minimized.
@richhickey Updated, thanks Rich! |
This comment has been minimized.
This comment has been minimized.
Thanks for writing this. Mind taking a look at my notes on transducers? https://gist.github.com/runexec/06b56a9dbd15e43145b9 |
This comment has been minimized.
This comment has been minimized.
What's not clear to me, how does a stateful transducer stack play with fork-join, any comments on that? |
This comment has been minimized.
This comment has been minimized.
@laczoka - I was wondering the same thing, but the So, as far as I can tell, if you want parallelism, stick with reducers (specifically |
This comment has been minimized.
This comment has been minimized.
To be quite honest, I'm not sure how transducers are much of a game-changer for me. I was really excited for them, because when I discovered reducers, they really changed the way I coded. In fact, given that reducers are usually faster than lazy core collections functions, especially on large collections, I refactored my code base to use almost exclusively the reducers functions when operating on collections (plus a few others I've gleaned from various sources such as a reducers version of There are three reasons that stand between me and using transducers:
and
? I know that transducers can be used lazily via the In short, I wish transducers, reducers, and the core collections functions were unified under one abstraction to be able to choose among processing the collection lazily (via |
This comment has been minimized.
This comment has been minimized.
awesome guide! maybe include some core.async info, like that |
This comment has been minimized.
This comment has been minimized.
shouldn't line-68 (into to xform from) be removed from the list of sexps that return a transducer? A very good recap .. thx! |
This comment has been minimized.
This comment has been minimized.
Thanks very much for this page. |
This comment has been minimized.
This comment has been minimized.
|
This comment has been minimized.
line 77 should probably be calling reducer-fn as well.