Skip to content

Instantly share code, notes, and snippets.

@balinterdi
Created February 1, 2013 21:46
Show Gist options
  • Save balinterdi/4694380 to your computer and use it in GitHub Desktop.
Save balinterdi/4694380 to your computer and use it in GitHub Desktop.
(ns benchmark-assoc.core
(:require [criterium.core :as crit]))
(defn no-assoc [items]
(apply hash-map (vec (flatten items))))
(defn assoc-all [items]
(apply assoc {} (vec (flatten items))))
(defn assoc-each-one [items]
(reduce (fn [h [k v]] (assoc h k v)) {} (map vec items)))
(defn benchmark-assoc []
(doseq [n [1e4 1e5 1e6]]
(let [pairs (take n (iterate #(map inc %) [0 0]))]
(println (str "--- " n " pairs"))
(println)
(println "Initialize with all")
(crit/bench (no-assoc pairs))
(println)
(println "Associng in one go")
(crit/bench (assoc-all pairs))
(println)
(println "Associng one-by-one")
(crit/bench (assoc-each-one pairs)))))
-- 10000.0 pairs
Initialize with all
Evaluation count : 960 in 60 samples of 16 calls.
Execution time mean : 67.632676 ms
Execution time std-deviation : 2.608085 ms
Execution time lower quantile : 62.879875 ms ( 2.5%)
Execution time upper quantile : 72.020153 ms (97.5%)
Associng in one go
Evaluation count : 960 in 60 samples of 16 calls.
Execution time mean : 73.413894 ms
Execution time std-deviation : 4.324156 ms
Execution time lower quantile : 68.028500 ms ( 2.5%)
Execution time upper quantile : 81.069625 ms (97.5%)
Found 1 outliers in 60 samples (1.6667 %)
low-severe 1 (1.6667 %)
Variance from outliers : 43.4800 % Variance is moderately inflated by outliers
Associng one-by-one
Evaluation count : 2400 in 60 samples of 40 calls.
Execution time mean : 24.898720 ms
Execution time std-deviation : 897.388513 us
Execution time lower quantile : 23.186725 ms ( 2.5%)
Execution time upper quantile : 26.579045 ms (97.5%)
--- 100000.0 pairs
Initialize with all
Evaluation count : 120 in 60 samples of 2 calls.
Execution time mean : 711.468392 ms
Execution time std-deviation : 37.029515 ms
Execution time lower quantile : 641.727000 ms ( 2.5%)
Execution time upper quantile : 799.149188 ms (97.5%)
Found 2 outliers in 60 samples (3.3333 %)
low-severe 2 (3.3333 %)
Variance from outliers : 38.4618 % Variance is moderately inflated by outliers
Associng in one go
Evaluation count : 120 in 60 samples of 2 calls.
Execution time mean : 773.466508 ms
Execution time std-deviation : 33.222215 ms
Execution time lower quantile : 730.681725 ms ( 2.5%)
Execution time upper quantile : 832.623088 ms (97.5%)
Found 1 outliers in 60 samples (1.6667 %)
low-severe 1 (1.6667 %)
Variance from outliers : 28.7296 % Variance is moderately inflated by outliers
Associng one-by-one
Evaluation count : 180 in 60 samples of 3 calls.
Execution time mean : 345.703717 ms
Execution time std-deviation : 48.639714 ms
Execution time lower quantile : 307.542333 ms ( 2.5%)
Execution time upper quantile : 458.668208 ms (97.5%)
Found 4 outliers in 60 samples (6.6667 %)
low-severe 2 (3.3333 %)
low-mild 2 (3.3333 %)
Variance from outliers : 82.4198 % Variance is severely inflated by outliers
@laczoka
Copy link

laczoka commented Feb 2, 2013

Just a minor, comment, as I was also curious what was going on:
I tested this benchmark script under 1.4 and 1.5RC4, a things MUCH slower on 1.5RC4, would be interesting to know why...

UPDATE: when I ran https://gist.github.com/4695353 against 1.5RC4, things get much more consistent, about 1-5% performance loss for all test cases.

@balinterdi
Copy link
Author

Good point, @laczoka, I ran it on 1.4.

Another possible reason is that @lesbroot and I have different number of cores in our machines. Not sure if the underlying code leverages multiple cores, but if it does then it could really cause a difference. I have a 2-core one. All this really proves my point that microbenchmarks are hard (but fascinating).

@steerio
Copy link

steerio commented Feb 9, 2013

I cleaned up the code a bit. I factored out flatten and vec, so benchmarked functions only contain code that we actually want to benchmark. I also added a doall call to realize the flattened list, which implicitly realizes the pairs as well (it does not make that much sense, though, as it only removes a first call penalty).

After these modifications no-assoc wins the race hands down (4.2ms with 10k pairs).

(defn no-assoc [items]
  (apply hash-map items))

(defn assoc-all [items]
  (apply assoc {} items))

(defn assoc-each-one [items]
  (reduce #(apply assoc % %2) {} items))

(defn benchmark-assoc []
  (doseq [n [1e4 1e5 1e6]]
    (let [pairs (for [i (range n)] [i i])
          flattened (doall (flatten pairs))]
      (println (str "--- " n " pairs"))
      (println)
      (println "Initialize with all")
      (crit/bench (no-assoc flattened))
      (println)
      (println "Associng in one go")
      (crit/bench (assoc-all flattened))
      (println)
      (println "Associng one-by-one")
      (crit/bench (assoc-each-one pairs)))))

I also felt that generating the pairs looks better with a list comprehension. :)

@balinterdi
Copy link
Author

@steerio It's nice to have it factored out and I like your code a lot better but as far as I see it should not make a difference as long as I'm only concerned about relative performance. In both the original and your version, no-assoc and assoc-all receives works on the same data so the relative numbers should be the same.

They are not. In my case, they are on par, while in yours no-assoc is 2x as fast.

To address the original question (assoc-all vs. assoc-each-one), they seem to be equally performant both for 1e5 and 1e6 pairs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment