Skip to content

Instantly share code, notes, and snippets.

@bsless
Created October 2, 2021 07:26
Show Gist options
  • Save bsless/0d9863424a00abbd325327cff1ea0a04 to your computer and use it in GitHub Desktop.
Save bsless/0d9863424a00abbd325327cff1ea0a04 to your computer and use it in GitHub Desktop.
Idiomatically refactor Clojure code to improve performance
;; http://johnj.com/from-elegance-to-speed.html
(def times (iterate #(+ % (rand-int 1000)) 0))
(def times-v (into [] (take 1e6) (iterate #(+ % (rand-int 1000)) 0)))
(defn smt-8 [times]
(->> times
(partition 8 1)
(map (juxt identity
(comp (partial apply -)
(juxt last first))))
(filter (comp (partial > 1000) second))
doall))
(cc/quick-bench (smt-8 times-v))
(defn sliding
([^long n]
(sliding n 1))
([^long n ^long step]
(fn [rf]
(let [a (java.util.ArrayDeque. n)]
(fn
([] (rf))
([result]
(let [result (if (.isEmpty a)
result
(let [v (vec (.toArray a))]
;;clear first!
(.clear a)
(unreduced (rf result v))))]
(rf result)))
([result input]
(.add a input)
(if (= n (.size a))
(let [v (clojure.lang.LazilyPersistentVector/createOwning (.toArray a))]
(dotimes [_ step] (.removeFirst a))
(rf result v))
result)))))))
(def smt-xf0
(comp
(sliding 8 1)
(map (juxt identity
(comp (partial apply -)
(juxt last first))))
(filter (comp (partial > 1000) second))))
(cc/quick-bench (doall (sequence smt-xf0 times-v)))
(def smt-xf1
(comp
(sliding 8 1)
(map (fn [v] [v (- (peek v) (nth v 0))]))
(filter (fn [v] (> 1000 (nth v 1))))))
(cc/quick-bench (doall (sequence smt-xf1 times-v)))
(def smt-xf2
(comp
(sliding 8 1)
(keep (fn [v] (let [d (- (peek v) (nth v 0))]
(when (> 1000 d)
[v d]))))))
(cc/quick-bench (doall (sequence smt-xf2 times-v)))
(import 'java.util.ArrayDeque)
(defn sliding*
([^long n]
(sliding* n 1))
([^long n ^long step]
(fn [rf]
(let [a (ArrayDeque. n)]
(fn
([] (rf))
([result]
(let [result (if (.isEmpty a)
result
(let [v (.toArray ^ArrayDeque a)]
;;clear first!
(.clear a)
(unreduced (rf result v))))]
(rf result)))
([result input]
(.add a input)
(if (= n (.size a))
(let [v (.toArray ^ArrayDeque a)]
(dotimes [_ step] (.removeFirst a))
(rf result v))
result)))))))
(def smt-xf3
(comp
(sliding* 8 1)
(keep (fn [^objects arr]
(let [d (- ^long (aget arr (unchecked-dec-int (alength arr))) ^long (aget arr 0))]
(when (> 1000 d)
[arr d]))))))
(do
(cc/quick-bench (smt-8 times-v)) ;; 1.2 sec
(cc/quick-bench (doall (sequence smt-xf0 times-v))) ;; 470 ms
(cc/quick-bench (doall (sequence smt-xf1 times-v))) ;; 54 ms
(cc/quick-bench (doall (sequence smt-xf2 times-v))) ;; 49 ms
(cc/quick-bench (doall (sequence smt-xf3 times-v))));; 30 ms
@GR-John
Copy link

GR-John commented Oct 5, 2021

Speaking of idiomatic, injest has some tricks ;)

(c/quick-bench
 (count
  (->> s
       (partition 8 1)
       (map (juxt identity
                  (comp (partial apply -)
                        (juxt last first))))
       (filter (comp (partial > 1000) second)))))

; Evaluation count : 6 in 6 samples of 1 calls.
;              Execution time mean : 18.776971 sec
;     Execution time std-deviation : 695.064922 ms
;    Execution time lower quantile : 17.849691 sec ( 2.5%)
;    Execution time upper quantile : 19.505612 sec (97.5%)
;                    Overhead used : 2.237442 ns

18.776 seconds. Now using just transducers:

(c/quick-bench
 (count
  (x>> s
       (x/partition 8 1)
       (map (juxt identity
                  (comp (partial apply -)
                        (juxt last first))))
       (filter (comp (partial > 1000) second)))))

; Evaluation count : 6 in 6 samples of 1 calls.
;             Execution time mean : 11.110633 sec
;    Execution time std-deviation : 138.432084 ms
;   Execution time lower quantile : 10.904165 sec ( 2.5%)
;   Execution time upper quantile : 11.258617 sec (97.5%)
;                   Overhead used : 2.237442 ns

11.110 seconds.

To try with parallism, 10 million is too big - we run out of memory. But we can chunk up our work:

(c/quick-bench
 (=>> s
      (x/partition 512)
      (x/partition 2 1)
      (map #(concat (first %) (take 7 (second %))))
      (map (fn [s]
             (x>> s
                  (x/partition 8 1)
                  (map (juxt identity
                             (comp (partial apply -)
                                   (juxt last first))))
                  (filter (comp (partial > 1000) second)))))
      (apply concat)
      count))

; Evaluation count : 6 in 6 samples of 1 calls.
;             Execution time mean : 3.467814 sec
;    Execution time std-deviation : 87.980027 ms
;   Execution time lower quantile : 3.349245 sec ( 2.5%)
;   Execution time upper quantile : 3.539669 sec (97.5%)
;                   Overhead used : 2.237442 ns

3.467 seconds on my 8 core. Not bad for such simple code. I'd argue this looks more idiomatic though:

(c/quick-bench
 (=>> s
      (x/partition 512)
      (x/partition 2 1)
      (map #(concat (first %) (take 7 (second %))))
      (map #(x>> %
                 (x/partition 8 1)
                 (map (fn [x] [x (- (last x) (first x))]))
                 (filter (comp (partial > 1000) second))))
      (apply concat)
      count))

; Evaluation count : 6 in 6 samples of 1 calls.
;             Execution time mean : 2.744263 sec
;    Execution time std-deviation : 26.483688 ms
;   Execution time lower quantile : 2.709099 sec ( 2.5%)
;   Execution time upper quantile : 2.774457 sec (97.5%)
;                   Overhead used : 2.237442 ns

And that's 2.744 seconds - even better.

With your enhancements:

(c/quick-bench
 (=>> s
      (x/partition 512)
      (x/partition 2 1)
      (map #(concat (first %) (take 7 (second %))))
      (map (fn [s]
             (x>> s
                  (x/partition 8 1)
                  (keep #(let [d (- ^long (last %) ^long (first %))]
                           (when (> 1000 d)
                             [% d]))))))
      (mapcat identity)
      count))

; Evaluation count : 6 in 6 samples of 1 calls.
;              Execution time mean : 2.263570 sec
;     Execution time std-deviation : 35.133977 ms
;    Execution time lower quantile : 2.201192 sec ( 2.5%)
;    Execution time upper quantile : 2.302601 sec (97.5%)
;                    Overhead used : 2.237442 ns

I get 2.263 seconds. Not quite java speed, but waaay easier to work with. And whereas the CL version is fine-tuned for for this super-light-weight transformation, if the work got any heavier, we'd likely start pulling ahead, with our idiomatic code.

@GR-John
Copy link

GR-John commented Oct 5, 2021

Sorry, no, with your times-v payload, I get:

(c/quick-bench
 (=>> times-v
      (x/partition 512)
      (x/partition 2 1)
      (map #(concat (first %) (take 7 (second %))))
      (map #(x>> %
                 (x/partition 8 1)
                 (map (fn [x] [x (- (last x) (first x))]))
                 (filter (comp (partial > 1000) second))))
      (apply concat)
      count))

; Evaluation count : 6 in 6 samples of 1 calls.
;             Execution time mean : 264.569217 ms
;    Execution time std-deviation : 5.173518 ms
;   Execution time lower quantile : 260.041933 ms ( 2.5%)
;   Execution time upper quantile : 272.853110 ms (97.5%)
;                   Overhead used : 2.237442 ns

0.264 seconds.

All the other times above reflected a larger payload (by 10 times) I was working against.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment