Skip to content

Instantly share code, notes, and snippets.

@ptaoussanis
Last active August 29, 2015 14:25
  • Star 3 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save ptaoussanis/0a294809bc9075b6b02d to your computer and use it in GitHub Desktop.
Clojure 1.8.0-alpha2 tuple performance
;;
;; LATEST UPDATE: 25 July 2015
;;
;; ****************************************************************
;; ** NB false alarm! My original benchmarks showing large perf **
;; ** improvements with tuples turned out to be noise, **
;; ** unfortunately. Current (+more reliable) numbers seem[1] to **
;; ** show no consistent significant advantage using currently **
;; ** available tuple implementations against real-world code. **
;; ** **
;; ** Sincere apologies for jumping the gun + publishing dodgy **
;; ** results! - Peter Taoussanis **
;; ** **
;; ** [1] Not a final word; think this'll need more time to **
;; ** become clear one way or the other. **
;; ****************************************************************
;; ** Latest results table on Google Docs: https://goo.gl/khgT83 **
;; ****************************************************************
*clojure-version* ; + -server jvm
(require '[criterium.core :as criterium :refer [bench quick-bench]
:rename {quick-bench qb}])
;; [1] 1.7.0
;; [2] 1.8.0-alpha2
;; [3] 1.8.0-snapshot as of b8607d587
;; [4] mikera's fork at https://github.com/mikera/clojure/tree/clj-1517
(let [v [1 2 3 4]
v2 [1 2 3 "foo"]
x 1
y 2
z 3]
; [1] ; [2] ; [3] ; [4]
(qb (conj v 5)) ; 59 ; 24 ; 21 ; 22
(qb (assoc v 1 3)) ; 44 ; 74 ; 64 ; 68
(qb (assoc {} :v1 v :v2 v :v3 v)) ; 405 ; 491 ; 388 ; 446
(qb (subvec v 1 3)) ; 27 ; 24 ; 25 ; 26
(qb [x y z]) ; 38 ; 22 ; 21 ; 22
(qb [[[x y z] y z] y z]) ; 91 ; 46 ; 53 ; 49
(qb (let [[x y z] v])) ; 22 ; 103 ; 97 ; 97
(qb (peek v)) ; 16 ; 24 ; 26 ; 27
(qb (pop v)) ; 46 ; 64 ; 65 ; 69
(qb (seq v)) ; 35 ; 35 ; 34 ; 31
(qb (rseq v)) ; 23 ; 19 ; 18 ; 20
(qb (count v)) ; 16 ; 14 ; 15 ; 17
(qb (reduce (fn [acc in]) nil v)) ; 62 ; 182 ; 184 ; 202
(qb (mapv inc v)) ; 361 ; 599 ; 565 ; 559
(qb (into [0] v)) ; 468 ; 560 ; 536 ; 783
(qb (hash [[[x y z] y z] y z])) ; 697 ; 465 ; 468 ; 454
(qb (hash [x y z])) ; 249 ; 238 ; 220 ; 173
(qb (= v v2)) ; 265 ; 306 ; 85 ; 82
)
@ptaoussanis
Copy link
Author

These tests are pretty arbitrary; just wanted to get a feel for general perf impact.
Remember to run the bench a couple times to eliminate JIT warmup.

Links

@mikera
Copy link

mikera commented Jul 20, 2015

Hey I had a go at implementing Zach Tellman's original unrolled vectors in this branch:

https://github.com/mikera/clojure/tree/clj-1517

Any chance you could run your tests on this as a comparison?

EDIT: My quick tests using criterium:

  • 1.8.0-alpha2: 2.43us
  • 1.8.0-clj1517: 1.56us

i.e. Zach's approach seems roughly 50-60% faster than the current alpha2 branch for this test

@mikera
Copy link

mikera commented Jul 20, 2015

@ptaoussanis
Copy link
Author

Hi @mikera, sorry for the delay replying - didn't notice your comment. Will add your fork next time I run the benchmarks.

First step: want to try get more consistent numbers - current variation's way too high to make useful comparisons. Will update when I've had a chance to investigate!

Cheers :-)

@ptaoussanis
Copy link
Author

So have posted a table of updated results to Google Docs

A few observations:

  • I wasn't able to detect any significant difference between your clj-1517 branch and the Clojure 1.8.0 snapshot as of clojure/clojure@b8607d5
  • Getting consistent numbers with such cheap single ops has been tough. My first results looked promising, but on closer inspection turned out to be mostly noise (?). With the present set of numbers (much, much higher rep counts to get the variation down), it's actually starting to look to me like none of the tuple implementations actually offer much real benefit (?).

Is anyone aware of a reference set of benchmarks somewhere that motivated the tuples work? It seems likely I'm doing something wrong / missing something obvious here. Ideas welcome!

Otherwise, next obvious step would some larger (more expensive) test suites.

@ztellman
Copy link

Is there a reason we're not using criterium for these tests?

@ztellman
Copy link

Sorry, we are. I was looking for the tell-tale quick-bench, didn't see it was aliased.

@mikera
Copy link

mikera commented Jul 22, 2015

Thanks for running these @ptaoussanis. I have to say that the latest results now look pretty surprising... it suggests that some things are actually faster in 1.7.0 which didn't even have Tuples....

I suspect something else is going on, especially for reduce

Once you are down in the 10-20 ns range for operations, things like var indirection and dispatch overhead often dominate performance.

@ptaoussanis
Copy link
Author

@ztellman, @mikera

Thanks a lot for the reference test Zach.

Have updated the Google doc results running https://github.com/ztellman/tuple-benchmark

Also double checked that I've got the correct Clojure build at https://github.com/mikera/clojure/tree/clj-1517

Observations:

  • I'm afraid I'm still struggling to reproduce any consistent significant advantage for any of the tuple implementations?
  • My numbers still jump around way too much for my liking. I ran the tests a dozen times to get the variance to settle down a bit, but I still routinely see cases where I'll run a test and consistently get x. Run it again in 10 minutes then consistently get 2x or 0.5x. These kinds of tiny benchmarks might just inherently be a bad idea?
  • If we do discount differences in the 0.5x to 2x range, that basically leaves no change from the 1.7.0 control group.

My environment:
2011 Macbook Air 1.7GHz Core i5 on OSX 10.9.5, Java 1.7.0_04
Java(TM) SE Runtime Environment (build 1.7.0_04-b21)
Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode)

I'm guessing that running on stronger hardware will make the results even more difficult to pin down since we'll be heading toward sub-nanosecond times.

Raw dump of my latest https://github.com/ztellman/tuple-benchmark results (same data used for the table on Google docs:

*** clojure-version {:major 1, :minor 7, :incremental 0, :qualifier nil}
(conj v 5)                     79.674845 ns                   1.852185 ns                  
(assoc v 1 3)                  51.676295 ns                   4.785463 ns                   
(assoc {} :v1 v :v2 v :v3 v)   406.991158 ns                  12.689784 ns                  
(subvec v 1 3)                 33.577234 ns                   0.794528 ns                   
[x y z]                        39.978538 ns                   1.653215 ns               
[[[x y z] y z] y z]            104.289727 ns                  4.830357 ns                   
(let [[x y z] v])              18.222980 ns                   0.516661 ns                   
(peek v)                       12.380225 ns                   1.056053 ns                      
(pop v)                        49.468258 ns                   3.661096 ns                   
(seq v)                        37.629471 ns                   2.886287 ns                   
(rseq v)                       23.641434 ns                   3.021730 ns                   
(count v)                      16.081021 ns                   1.189006 ns                   
(reduce + 0 v)                 102.914020 ns                  8.170103 ns                   
(reduce + v)                   85.225498 ns                   10.946881 ns                  
(mapv inc v)                   348.483248 ns                  12.959969 ns                     
(into [0] v)                   491.365971 ns                  34.585707 ns                
(hash [[[x y z] y z] y z])     578.937288 ns                  58.726314 ns                 
(hash [x y z])                 274.071603 ns                  9.098456 ns                   
(= v v2)                       101.821314 ns                  1.824075 ns                     

*** clojure-version {:major 1, :minor 8, :incremental 0, :qualifier "snapshot1"}
(conj v 5)                     24.970220 ns                   1.550786 ns                   
(assoc v 1 3)                  123.809558 ns                  6.611467 ns                   
(assoc {} :v1 v :v2 v :v3 v)   437.247648 ns                  14.715381 ns                                  
(subvec v 1 3)                 45.283807 ns                   2.449585 ns                   
[x y z]                        35.008803 ns                   1.432197 ns                   
[[[x y z] y z] y z]            94.999972 ns                   8.123341 ns                  
(let [[x y z] v])              60.905071 ns                   8.747650 ns                   
(peek v)                       13.805913 ns                   2.705339 ns                   
(pop v)                        104.500128 ns                  10.772506 ns                   
(seq v)                        32.826113 ns                   1.625094 ns                
(rseq v)                       31.345102 ns                   1.299080 ns                   
(count v)                      20.245304 ns                   0.947091 ns                   
(reduce + 0 v)                 213.534805 ns                  11.786278 ns                  
(reduce + v)                   174.435390 ns                  7.104740 ns                   
(mapv inc v)                   496.698079 ns                  20.926726 ns                    
(into [0] v)                   645.231855 ns                  14.801555 ns                  
(hash [[[x y z] y z] y z])     572.492217 ns                  25.759777 ns                  
(hash [x y z])                 150.181880 ns                  8.617818 ns                   
(= v v2)                       73.758903 ns                   6.200224 ns                  

*** clojure-version {:major 1, :minor 8, :incremental 0, :qualifier "clj1517"}
(conj v 5)                     36.188986 ns                   0.976548 ns                   
(assoc v 1 3)                  29.863559 ns                   1.235988 ns                   
(assoc {} :v1 v :v2 v :v3 v)   574.408239 ns                  20.240338 ns                   
(subvec v 1 3)                 33.957882 ns                   0.791464 ns                   
[x y z]                        27.953441 ns                   1.437653 ns                   
[[[x y z] y z] y z]            55.601830 ns                   2.363494 ns                   
(let [[x y z] v])              23.025138 ns                   1.091057 ns                   
(peek v)                       14.924695 ns                   0.846271 ns                   
(pop v)                        36.885387 ns                   1.705071 ns                   
(seq v)                        67.948751 ns                   3.312500 ns                   
(rseq v)                       36.485379 ns                   4.274963 ns                   
(count v)                      20.106256 ns                   1.686912 ns                  
(reduce + 0 v)                 78.483411 ns                   3.060571 ns                  
(reduce + v)                   67.396896 ns                   2.753433 ns                  
(mapv inc v)                   215.915015 ns                  9.273243 ns                  
(into [0] v)                   94.283390 ns                   6.095461 ns                 
(hash [[[x y z] y z] y z])     630.288337 ns                  13.606491 ns                  
(hash [x y z])                 236.689946 ns                  6.358934 ns                   
(= v v2)                       128.622776 ns                  6.595519 ns 

Any other ideas?

@ptaoussanis
Copy link
Author

Zach, I've added your own results from https://gist.github.com/ztellman/3701d965228fb9eda084 to the Google docs table and am seeing a similar outcome?

I.e. Differences observed basically seem to fall w/in about the normal range for noise variation from the 1.7.0 control group (at least with the amount of variation that I seem to be getting on my end), no?

Is the notion of running these types of micro benchmarks just a dud, or were my expectations maybe off re: expected impact of the tuples?

Will try run some larger benchmarks against a couple small-vec-heavy systems I have later today.

@ptaoussanis
Copy link
Author

Okay, so ran some larger benchmarks on a real system I have on me that uses plenty of 2- and 3- vectors.

Clojure 1.8.0-snapshot was consistently a little slower than 1.7.0. Not by much: 5% over a suite of 5 tests.
Clojure 1.8.0 with clj-1517 was statistically indistinguishable from 1.7.0 after JIT warmup.

This is with the same environment, -server mode again, lots of JIT warmup time.

It's looking to me like maybe the JIT ends up basically optimising away the difference between the tuple and vector implementations? It's of course very possible that my results are wrong or my system somehow unusual.

@mikera I think you mentioned that you'd run some larger core.matrix benchmarks, yes? Do you maybe have any more info on those handy?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment