Created
December 18, 2015 14:36
-
-
Save muhuk/467f884b665107885d05 to your computer and use it in GitHub Desktop.
But will it chunk?
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(defn split-coll-1 [elem coll] | |
(loop [coll coll | |
parts [] | |
part []] | |
(if-let [e (first coll)] | |
(if (= e elem) | |
(recur (rest coll) (conj parts part) []) | |
(recur (rest coll) parts (conj part e))) | |
(if (seq part) | |
(conj parts part) | |
parts)))) | |
(defn split-coll-2 [elem coll] | |
(loop [coll coll | |
parts (transient []) | |
part (transient [])] | |
(if-let [e (first coll)] | |
(if (= e elem) | |
(recur (rest coll) (conj! parts (persistent! part)) (transient [])) | |
(recur (rest coll) parts (conj! part e))) | |
(let [part (persistent! part)] | |
(persistent! | |
(if (seq part) | |
(conj! parts part) | |
parts)))))) | |
(defn split-coll-3 [elem coll] | |
(loop [coll coll | |
parts (transient []) | |
part []] | |
(if-let [e (first coll)] | |
(if (= e elem) | |
(recur (rest coll) (conj! parts part) []) | |
(recur (rest coll) parts (conj part e))) | |
(persistent! (conj! parts part))))) | |
(def coll (apply concat (repeat 100 [:a :b :c :x :x :d :x :e]))) | |
(def coll2 (repeatedly 1000 #(rand-nth [:a :b :c :d :e :x :x]))) | |
(def coll3 (into [] (repeatedly 1000 #(rand-nth [:a :b :c :d :e :x :x])))) | |
(def coll4 (repeatedly 5000 #(rand-nth [:a :b :c :d :e :x :x]))) | |
(doseq [c [coll coll2 coll3 coll4] | |
f [split-coll-1 split-coll-2 split-coll-3]] | |
(time (dotimes [n 1000] | |
(f :x c)))) | |
Thanks for analysis! I just checked it under ClojureScript and the numbers are (relatively) similar. Your split-coll-3
pretty much wins in this benchmark. In my case the collection is small (1-5 elements for 95% of time, and probably max 10 elements in all cases). I added 8 element collection to this benchmark and split-coll-1
wins in this case, so I'm going to go with this original implementation.
🍻
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Output (in a Clojure REPL):
split-coll-3
).:x
is too (damn) high.coll4
sometimes does better withsplit-coll-1
.split-coll-3
), really does improve performance. But only that much. In other words persistent collections, it seems, are well tuned.part
). Mutability IOW.coll2
&coll3
is significant. More so than transient usage. I think perhaps using a chunked-seq is useful there.(apply concat many-things)
is probably not a good idea.