Skip to content

Instantly share code, notes, and snippets.

@muhuk
Created December 18, 2015 14:36
Show Gist options
  • Save muhuk/467f884b665107885d05 to your computer and use it in GitHub Desktop.
Save muhuk/467f884b665107885d05 to your computer and use it in GitHub Desktop.
But will it chunk?
(defn split-coll-1 [elem coll]
(loop [coll coll
parts []
part []]
(if-let [e (first coll)]
(if (= e elem)
(recur (rest coll) (conj parts part) [])
(recur (rest coll) parts (conj part e)))
(if (seq part)
(conj parts part)
parts))))
(defn split-coll-2 [elem coll]
(loop [coll coll
parts (transient [])
part (transient [])]
(if-let [e (first coll)]
(if (= e elem)
(recur (rest coll) (conj! parts (persistent! part)) (transient []))
(recur (rest coll) parts (conj! part e)))
(let [part (persistent! part)]
(persistent!
(if (seq part)
(conj! parts part)
parts))))))
(defn split-coll-3 [elem coll]
(loop [coll coll
parts (transient [])
part []]
(if-let [e (first coll)]
(if (= e elem)
(recur (rest coll) (conj! parts part) [])
(recur (rest coll) parts (conj part e)))
(persistent! (conj! parts part)))))
(def coll (apply concat (repeat 100 [:a :b :c :x :x :d :x :e])))
(def coll2 (repeatedly 1000 #(rand-nth [:a :b :c :d :e :x :x])))
(def coll3 (into [] (repeatedly 1000 #(rand-nth [:a :b :c :d :e :x :x]))))
(def coll4 (repeatedly 5000 #(rand-nth [:a :b :c :d :e :x :x])))
(doseq [c [coll coll2 coll3 coll4]
f [split-coll-1 split-coll-2 split-coll-3]]
(time (dotimes [n 1000]
(f :x c))))
@muhuk
Copy link
Author

muhuk commented Dec 18, 2015

Output (in a Clojure REPL):

"Elapsed time: 113.709655 msecs"
"Elapsed time: 150.318221 msecs"
"Elapsed time: 98.630873 msecs"
"Elapsed time: 140.721525 msecs"
"Elapsed time: 166.15507 msecs"
"Elapsed time: 128.038496 msecs"
"Elapsed time: 104.233172 msecs"
"Elapsed time: 137.323314 msecs"
"Elapsed time: 98.44486 msecs"
"Elapsed time: 692.423699 msecs"
"Elapsed time: 819.662113 msecs"
"Elapsed time: 657.230314 msecs"
nil
  • As you can see every 3rd timing is better than the others (split-coll-3).
  • I think the reason for that is the frequency of :x is too (damn) high.
  • Note that coll4 sometimes does better with split-coll-1.
  • I think using transients (in the case of split-coll-3), really does improve performance. But only that much. In other words persistent collections, it seems, are well tuned.
  • If you want moar performance you should consider using an array for the inner collection (part). Mutability IOW.
  • Another thing is the difference between coll2 & coll3 is significant. More so than transient usage. I think perhaps using a chunked-seq is useful there.
  • Also (apply concat many-things) is probably not a good idea.

@ku1ik
Copy link

ku1ik commented Dec 19, 2015

Thanks for analysis! I just checked it under ClojureScript and the numbers are (relatively) similar. Your split-coll-3 pretty much wins in this benchmark. In my case the collection is small (1-5 elements for 95% of time, and probably max 10 elements in all cases). I added 8 element collection to this benchmark and split-coll-1 wins in this case, so I'm going to go with this original implementation.

🍻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment