Skip to content

Instantly share code, notes, and snippets.

@reborg
Last active November 19, 2017 17:06
Show Gist options
  • Save reborg/37c76a61c9a60ddae8de737170e2e675 to your computer and use it in GitHub Desktop.
Save reborg/37c76a61c9a60ddae8de737170e2e675 to your computer and use it in GitHub Desktop.
Explorations around frequencies as a reducing function for transducers, sequential or parallel.
(require '[criterium.core :refer [quick-bench]])
(require '[clojure.core.reducers :as r])
(import '[java.util HashMap Collections Map]
'java.util.concurrent.atomic.AtomicInteger
'java.util.concurrent.ConcurrentHashMap)
(set! *warn-on-reflection* true)
;; 500k maps with the same key. value are overlapping 1/5 of the time.
(def data
(into []
(map hash-map
(repeat :samplevalue)
(concat
(range 1e5)
(range 1e5)
(range 1e5)
(range 1e5)
(range 1e5)))))
;; Standard version. Transformations as transducers, frequencies
;; applied on the results.
(quick-bench
(frequencies
(eduction
(keep :samplevalue)
(map int)
data)))
;; Execution time mean : 236.744558 ms
;; frequencies implementation is used as a reducing function. A transient is updated
;; during reduction, it is made persistent as the last step.
(quick-bench
(transduce
(comp
(keep :samplevalue)
(map int))
(completing #(assoc! %1 %2 (inc (get %1 %2 0))) persistent!)
(transient {})
data))
;; Execution time mean : 207.890362 ms
;; Going parallel on a mutable Java concurrent-hashmap takes the time down consistently.
;; Restriction1: transducers (if any) need to be stateless.
;; Restriction2: nil is not allowed, so they are always removed.
;; Bonus: the output is converted into persistent at the end. You can
;; remove that step for an additional speedup (~20ms).
(quick-bench
(let [m (ConcurrentHashMap. (quot (count data) 2) 0.75 (.availableProcessors (Runtime/getRuntime)))
combinef (fn ([] m) ([_ _]))
rf (fn [^Map m k]
(let [^AtomicInteger v (or (.get m k) (.putIfAbsent m k (AtomicInteger. 1)))]
(when v (.incrementAndGet v))
m))
reducef ((comp (keep :samplevalue) (map int)) rf)]
(do (r/fold combinef reducef data)
(into {} m))))
;; Execution time mean : 45.498409 ms
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment