Skip to content

Instantly share code, notes, and snippets.

@safehammad
Last active August 3, 2023 12:15
Show Gist options
  • Save safehammad/b46d464bf41bc048eaec343572a8d839 to your computer and use it in GitHub Desktop.
Save safehammad/b46d464bf41bc048eaec343572a8d839 to your computer and use it in GitHub Desktop.
The REPL session from London Clojure Dojo on 1st Aug 2023 helping us understand chunking and its potential gotchas.
(ns chunking
"This is the REPL session a group of us played with at the London Clojure Dojo on
1st August 2023 with the aim of understanding chunking and its potential gotchas.")
;; Many of Clojure's core functions such as `map`, `filter`, `take` and `drop` produce
;; lazy sequences. The elements of a lazy sequence are only evaluated or "realized" as
;; needed. Although it appears each element is evaluated one at a time, since Clojure 1.1,
;; evaluation is "chunked" such that elements are evaluated 32 at a time, a performance
;; optimisation based on the fact that Clojure's immutable data structures are a so-called
;; "trie" with 32 children per node.
;; This demonstrates chunking by introducing a `println` side effect in a mapping function.
;; Rather than printing out 40 times as a result of the `drop` before producing elements,
;; two chunks of 32 are printed first before any element is produced. This is repeated after
;; the 64th element, followed by a final chunk of 4 after the 96th element to complete the 100.
(drop 40 (map (fn [x] (println "\n; out:" x) x) (range 100)))
;; This demonstrates what at first glance is unexpected behaviour. One might expect the
;; `for` to produce a steady stream of elements resulting in similar behaviour to the
;; example above. In fact, in this example the `for` pulls from more than one sequence, each
;; of which is independently chunked, the result being that chunks of a maximum of three
;; elements (the `y`) are produced. Depending on the scenario, this can lead to unexpected
;; greedy evaluation which can have an *adverse* effect on performance.
(drop 40 (for [x (range 30)
y (range 3)]
(do
(println "\n; out:" {:x x, :y y})
x)))
;; Same code but using `map` rather than `for` to demonstrate that the outcome is a function
;; of chunking rather than how `for` works.
(drop 40 (mapcat
(fn [x]
(map #(do (println "\n; out:" {:x x, :y %}) x) (range 3)))
(range 100)))
;; We searched for ways to remove the chunking behaviour if necessary and found this
;; suggestion of a function called `seq1` from Michael Fogus reproduced on Stackoverflow:
;; https://stackoverflow.com/questions/10556421/is-for-not-actually-lazy-in-clojure
;; The ability to control chunking in Clojure has been discussed in the past but rejected.
(defn seq1 [#^clojure.lang.ISeq s]
(reify clojure.lang.ISeq
(first [_] (.first s))
(more [_] (seq1 (.more s)))
(next [_] (let [sn (.next s)] (and sn (seq1 sn))))
(seq [_] (let [ss (.seq s)] (and ss (seq1 ss))))
(count [_] (.count s))
(cons [_ o] (.cons s o))
(empty [_] (.empty s))
(equiv [_ o] (.equiv s o))))
;; Using `seq1` produces the now easy to grok printing of 40 elements followed by an
;; interleaved printing and evaluation, one at a time.
(drop 40 (map (fn [x] (println "\nout:" x) x) (seq1 (range 100))))
;; An even more beautiful lispy function to stop chunking by Stuart Sierra:
;; https://stackoverflow.com/questions/3407876/how-do-i-avoid-clojures-chunking-behavior-for-lazy-seqs-that-i-want-to-short-ci
(defn unchunk [s]
(when (seq s)
(lazy-seq
(cons (first s)
(unchunk (next s))))))
;; The function `unchunk` produces the same result as `seq1`.
(drop 40 (map (fn [x] (println "\nout:" x) x) (unchunk (range 100))))
;; Then we took a detour trying to understand how the function `realized?` works.
;; I'll let the results speak for themselves...
(def r (unchunk (range 60)))
(realized? r) ; false
(take 1 (drop 40 r)) ; (40)
(realized? r) ; true
;; The clojure docs for `realized?` shows the following working but for us it threw an error:
;; Execution error (ClassCastException).
;; class clojure.lang.LongRange cannot be cast to class clojure.lang.IPending
(realized? (range 5))
;; So we tried `realized?` with a plain `(range)` which worked!
;; Note that it's immediately realized, unlike an unchunked `range`.
(def r2 (range))
(realized? r2) ; true
;; We then looked at the types and learned that not all ranges are the same
(type (range 5)) ; clojure.lang.LongRange
(type (range)) ; clojure.lang.Iterate
(type (unchunk (range 60))) ; clojure.lang.LazySeq
;; And by looking at the parent classes we find that clojure.lang.LongRange doesn't inherit
;; from clojure.lang.IPending, whereas clojure.lang.Iterate and clojure.lang.LazySeq do,
;; hence the behaviour of `realized?`.
(ancestors clojure.lang.LongRange)
(ancestors clojure.lang.Iterate)
(ancestors clojure.lang.LazySeq)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment