danlentz/example.md

## example.md

      
    Raw
  

              example.md
            
          
    Working more efficiently with Clojure compare

A few notes on branch prediction and microbenchmarking Clojure comparisons
The Simple Case

There are times when you need to do a large number of comparison operations, for example when implementing a sort or a tree structure.  Clojure has provided a helpful core function to compare items that implement the Comparable interface:
(compare 2 5)
;; => -1

(compare 99 99)
;; => 0

(compare 100 1)
;; => 1
This is handy, as you can see we also have a distinct return value, 0, to indicate equality.
A Different Case

However, not all Comparable objects provide the same return value semantics.  For example, when comparing strings, the result might be any integer value:
(compare "a" "z")
;; => -25

(compare "a" "a")
;; => 0

(compare "foo" "f")
;; => 2
Some Simple Test Data

Lets define some simple test data and play around with some example code that compares some randomly sampled data.
(def unit-chars (map (comp str char) (range (int \a) (inc (int \z)))))

;; => ("a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" 
       "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z")

(defn make-sample 
  "generate the data for n comparisons"
  [n]
  (partition 2 (take (* 2 n) (repeatedly #(rand-nth  unit-chars)))))

;; => (("p" "q") ("x" "c") ("b" "b") ... )

(def +test-data+ (make-sample 10000))

The Naive Approach

Given the range of the clojure.core/compare function, one might quickly write something like the following to consume its results.  The cond branch tests for the result of the comparison and dispatches accordingly.
(defn simple [data]
        (time
          (count 
            (reduce (fn [acc [x y]]
                (let [comp-value (compare x y)]
                  (conj acc (cond
                              (neg? comp-value) :less
                              (pos? comp-value) :more
                              true              :equal))))
              [] data))))
;; => #'user/simple

(simple +test-data+)

;; => 10000 
;; "Elapsed time: 44.137666 msecs"

The Improved Approach

Can we do any better than this?  It turns out, yes, with sufficient motivation :)
If we were to normalize the return value of clojure.core/compare to a known set of constants, we can use the (more restrictive) clojure.core/case conditional to replace the more general clojure.core/cond.  The result will be simpler object code that is better optimized through compiler branch prediction. So, lets "normalize" the result of the compare to a more traditional range of -1, 0, 1.
 (defn normalize ^long [^long x]
  (if (zero? x)
    x
    (bit-or 1
      (bit-shift-right x 63))))
      
 (defn normal-compare ^long [x y]
   (normalize (compare x y)))

 (defn improved [data]
        (time 
          (count
            (reduce (fn [acc [x y]]
                      (let [comp-value (normal-compare x y)]
                        (conj acc (case comp-value
                                    -1 :less
                                    1 :more
                                    :equal))))
                      [] data))))
 
 (improved +test-data+)
 
;; => 10000
"Elapsed time: 5.064709 msecs"
Result

Informally, this shows a nearly 10x improvement 🤩