Skip to content

Instantly share code, notes, and snippets.

@enforser
Last active August 1, 2023 18:31
Show Gist options
  • Save enforser/f43e42a803ca8c351daa4aba079955b4 to your computer and use it in GitHub Desktop.
Save enforser/f43e42a803ca8c351daa4aba079955b4 to your computer and use it in GitHub Desktop.
Lazy side effects in clojure - overcoming chunking
;; Unlike fully lazy languages such as Haskell, Clojure seqs often implement chunking.
;; This means that when accessing the first element of a sequence, 32 members will be evaluated.
;; An example of this can be seen by implementing some side effect in a map.
(first (map (fn [x] (prn x) x) (range 5))) ;; prints 0 through 4, then returns 0
;; The above example will return 0 (as expected), but you'll notice it prints off all 5 elements
;; of (range 5).
;; In the next case we can see that the evaluation caused by the access of the first element stops after
;; the first 32 elements are evaluated.
(first (map (fn [x] (prn x) x) (range 100))) ;; prints 0 through 31, then returns 0
;; This example demonstrates how chunking can also affect mutable objects.
(let [n (atom 0)
coll (map (fn [x] (swap! n inc)) (range 100))]
(first coll) ;; This access causes the swap! to be called on n 32 times.
@n)
;; => 32
;; While the chunking is generally useful, there are certainly cases where it is distruptive - and a
;; purely lazy sequence would be better.
;; This could be to avoid having side effects created when they aren't actually being used, or perhaps
;; to avoid the overhead of performing 32 heavy operations, when all that is needed at that time is
;; one.
;; You can get around this chunking by initially defining your sequence as a fully lazy-seq.
(defn ->lazy-seq
"Converts coll into a non-chunked lazy-seq"
[coll]
(when (not-empty coll)
(lazy-seq (cons (first coll) (->lazy-seq (rest coll))))))
(->lazy-seq [1 2 3 4])
;; => '(1 2 3 4)
;; ->lazy-seq is a simple example of how a collection might be transformed into a lazy sequence, but
;; this does not really address the issue I've outlined above - because in order for the collection
;; to already exist then the side effects are already loaded up and ready to be executed once the
;; first element is grabbed from coll.
;; My solution for this is to essentially build a map function that constructs a lazy sequence while
;; applying the provided function.
(defn map-lazy-seq
[f coll]
(when (not-empty coll)
(lazy-seq (cons (f (first coll)) (map-lazy-seq f (rest coll))))))
(first (map (fn [x] (prn x) x) (range 3)))
;; 0
;; 1
;; 2
;; => 0
(first (map-lazy-seq (fn [x] (prn x) x) (range 3)))
;; 0
;; => 0
(let [n (atom 0)
coll (map-lazy-seq (fn [x] (swap! n inc)) (range 100))]
(first coll) ;; With map-lazy-seq the atom is only increased once.
@n)
;; => 1
;; We can see that map-lazy-seq prevents the side effect from occurring until it is actually accessed.
;; I should also make note that in general it is probably preferable to just create a sequence of functions,
;; then evaluate them when they are accessed to get the side effects.
;; The use case that I initially ran into this problem was to perform batch queries of a database.
;; I needed to read in a chunk of data that wouldn't use all of my memory, process it, write it,
;; then fetch the next chunk. With chunking, 32 batches were fetched all at once causing an out
;; of memory exception. Implementing the map-lazy-seq solution allowed data to only be fetched when
;; acually accessed - and without having to pass around the function to pull the data.
;; This solution can also help prevent unwanted side effects to operations on mutable objects, such as atoms.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment