A few notes on branch prediction and microbenchmarking Clojure comparisons
There are times when you need to do a large number of comparison operations, for example when implementing a sort or a tree structure. Clojure has provided a helpful core function to compare items that implement the Comparable
interface:
(compare 2 5)
;; => -1
(compare 99 99)
;; => 0
(compare 100 1)
;; => 1
This is handy, as you can see we also have a distinct return value, 0, to indicate equality.
However, not all Comparable
objects provide the same return value semantics. For example, when comparing strings, the result might be any integer value:
(compare "a" "z")
;; => -25
(compare "a" "a")
;; => 0
(compare "foo" "f")
;; => 2
Lets define some simple test data and play around with some example code that compares some randomly sampled data.
(def unit-chars (map (comp str char) (range (int \a) (inc (int \z)))))
;; => ("a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m"
"n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z")
(defn make-sample
"generate the data for n comparisons"
[n]
(partition 2 (take (* 2 n) (repeatedly #(rand-nth unit-chars)))))
;; => (("p" "q") ("x" "c") ("b" "b") ... )
(def +test-data+ (make-sample 10000))
Given the range of the clojure.core/compare
function, one might quickly write something like the following to consume its results. The cond
branch tests for the result of the comparison and dispatches accordingly.
(defn simple [data]
(time
(count
(reduce (fn [acc [x y]]
(let [comp-value (compare x y)]
(conj acc (cond
(neg? comp-value) :less
(pos? comp-value) :more
true :equal))))
[] data))))
;; => #'user/simple
(simple +test-data+)
;; => 10000
;; "Elapsed time: 44.137666 msecs"
Can we do any better than this? It turns out, yes, with sufficient motivation :)
If we were to normalize the return value of clojure.core/compare
to a known set of constants, we can use the (more restrictive) clojure.core/case
conditional to replace the more general clojure.core/cond
. The result will be simpler object code that is better optimized through compiler branch prediction. So, lets "normalize" the result of the compare
to a more traditional range of -1, 0, 1.
(defn normalize ^long [^long x]
(if (zero? x)
x
(bit-or 1
(bit-shift-right x 63))))
(defn normal-compare ^long [x y]
(normalize (compare x y)))
(defn improved [data]
(time
(count
(reduce (fn [acc [x y]]
(let [comp-value (normal-compare x y)]
(conj acc (case comp-value
-1 :less
1 :more
:equal))))
[] data))))
(improved +test-data+)
;; => 10000
"Elapsed time: 5.064709 msecs"
Informally, this shows a nearly 10x improvement 🤩