philandstuff/clojurex-2014.org

## clojurex-2014.org

      
    Raw
  

              clojurex-2014.org
            
          
    Korny Sietsma, Pragmatic Performance Testing

quotes


  “we don’t care about perf”
  “clojure should be fast”
  “we’re using mongo!”
  “clojure might not be fast enough, we should use type hints”

JMeter

classic perf testing approach:


  build it, then test it

monitor all the things


  instrument with graphite

(GET "/slow" [] (t/time! (t/timer "slow-page")
                         (do (Thread/sleep (+ 100 (rand-int 100))) "oy!")))

  “maybe we could use a macro” – no!
  madness. use bidi!

robert/hooke


  use robert/hooke to instrument anything

(add-hook #'function #'hook)
laziness messes with measurement


  (postwalk identity r) to disable laziness recursively

outcomes


  fix 2: don’t iterate over slow things
    
      remove Schema
      just return fields we need
      or maybe paginate
        
          don’t do this prematurely!
        
      
  fix 3: tags rarely change - we can memoize
    
      (what about cache invalidation?)
    
  
  fix 4: queries without indices
  mongoDB handy setting:

// tablescans are now banned!
db.getSiblingDB("admin").runCommand(
    { setParameter: 1, notablescan: 1}
);

  fix 5: unrolled queries
    
      temptation:
    
  
(->> (all-tumblrs db)
     (map (partial all-posts-for-tumble db))
     (map (partial all-pics-for-post db)))

  slow in code, fast in SQL

digging deeper - profiling


  can roll your own profiler
  JVisualVM - free, but limited
  Yourkit - free for big OS projects, or 15-day demo
  Riemann?
  fix 6: RTFM
    
      stencil/render-string not memoized
      use stencil/render-file instead
    
  
  fix 7: more dakka
  fix 8: async work queues
  fix 10: tweak clojure
    
      “clojure high perf programming” by Shantanu Kumar
    
  
browser speed


  whole nothing issue. esp mobile browsers
  hard to test at scale

testing options


  scale “from cheap, fast feedback, flexible” to “accurate, slow, rigid”
  don’t need to go full sophisticated immediately!
  “performance test pyramid” by analogy with test pyramid

Q&A


  simulant
    
      don’t know, sounds like a good idea
    
  
  is memoization a harder problem?
    
      memoization is a cheat
        
          the first person still gets horrible perf
          if things change, the req which gets the change gets horrible
            perf
        
      
  aren’t most of your problems RTFM? aren’t you solving a cultural
    problem with tech?
    
      we shouldn’t have needed JMeter to find stencil slowness
      reading manuals goes out of the window under pressure
    
  
  what do you do when you hit a JVM limit? eg integer parsing <JDK8
    had a global lock
    
      throw up your hands and say screw it
      it’s hard to estimate how much time you should put into perf
        testing
    
  
David Pollak, Dragonmark

intro


  distributed core.async and associated libraries

-
naming things

objects


  have named “slots”
    
      code or data
      no method/field distinction
    
  
  q: how do you send an “object” from Ponyville to Equestria?

yay clojure


  separation of data and schema and behaviour
  immutable data structures: not cyclical
  transit: richer than json, cross-platform serialization

async


  nothing is synchronous
    
      but we fake it
      multiple cores and NUMA
    
  
  but
    
      we humans gloss over it in day-to-day life
      in computer systems, we have to think of everything as
        asynchronous
    
  
actors and channels


  akka: Any => Unit
    
      ie take any value, return no response
      “I’m going to the pub”
    
  
  X => LAFuture[Y]
    
      “bruce, I’d like a hug”
      timeouts
    
  
  Channels > actors
    
      because:
        
          backpressure (paging @RobAshton)
          separation of concerns
            
              fanout
            
          
          actors can be modelled with channels (not supervisors
            though), but not vice versa
        
      
linearization


  script:
    
      Spike: find the book
      Pinky: find Applejack
      S&P: Look for gems
    
  
  when S&P are done, Twilight does stuff with gems, applejack, and
    the book aka the gofor macro
  gofor: for comprehension, go-style

(gofor
 [a (foo channel {:thing val})]
 :let [b (inc a)]
 [c (baz other-chan)
  :timeout 45000]
 (println a b c)
 :error (println &err &at))
sends a message to a channel
simultaneous dispatch
documentation


  you can query services for documentation

dragonmark

lift comet rant


  lift is the best thing for server push around
    
      and has been for 8 yeras
      currently http long poll
      actors server & client side
        
          multiple actors but only one http connection
        
      
      developers don’t have to worry about guts
        
          retries, multiplexing, etc
        
      
plumbing


  the best plumbing “just works”
  focus on taking a shower, rather than source or destination of
    water
  pipes accessible but not in your face
    
      so you can fix them when you need to
      or call a plumber
    
  
  eg Lift Comet
    
      parameters for retry backoff tuning
      how to surface errors, in browser and in server
    
  
why dragonmark?


  isolate logic from transport for testing
    
      separate thinking about domain logic from thinking about REST
        calls
      fewer developer context switches
    
  
  faster dev cycles
  discover services, send messages

herding cattle at mixradio, @neilprosser

intro


  @noahcampbell - “treat your servers like cattle, not pets”
  REFful µservices
  github.com/mixradio/mr-clojure
  legacy java, php
  riak, rabbit, elasticsearch

problems


  continuous delivery since 2010
  snowflake servers
    
      configuration drift
      variation of size, spec, versions
      slow provisioning (2 week lead time)
      slow deployment
    
  
  configuration confusion
    
      database issues & rollbacks
        
          no audit, so hard to know
        
      
  escaped own tin dc, migrated to aws

environments


  dev account, prod

services


  naming scheme: mr-*

mr-klink


  command line swiss army knife
  written in go
    
      JVM dependency would suck :(
      also considered node & python (but rejected)
      go is cross-platform
    
  
  uses our RESTful services


  create app, find images, deploy, list boxes

faraday


  github.com/ptaoussanis/faraday

mr-baker


  bakes machine images

phoenix servers


  throw away old servers and deploy new ones to upgrade
  no upgrades
    
      handled by new bake
      puppet for a few important things
    
  
process


  amazon linux base image
  mixradio base image
  service image | ad-hoc image (testing)
  shells out to packer
  uses Raynes/conch

mr-tyrant


  per-environment configuration
  mixradio/mr-tyrant
  backed by github for config storage
  app is a readonly frontend
  Raynes/tentacles
  app properties
    
      db conn strings
      port numbers
    
  
  deployment parameters
    
      capacity, security groups
    
  
  launch data
    
      ie userdata
    
  
mr-pedantic


  keeping infrastructure in sync
  mr-pedantic
  “puppet for cloud infrastructure”
    
      configuration exists
      gets compared with running env
      running env gets corrected to match config
      idempotent
    
  
  lets us roll out in another region
  backed by github for config storage
  clojure for configuration
    
      clojail to avoid stupidity
    
  
mr-maestro


  deployment orchestration
  mr-maestro
    
      asgard via APIs
    
  
  red/black deployment
    
      bring up new version
      healthcheck
      add to load balancer pool (but keep old version)
      when happy, remove old version
      (if unhappy, can always flip back to old version)
    
  
  asgard was good for finding what we wanted
  maestro can deploy itself
    
      the new servers will receive the messages which will kill the
        old versions
    
  
Nikita Prokopov, DataScript for web dev

datascript is a post-modern db


  someone else’s good idea redone on a more primitive level
  datalog queryable in-memory database
  triple store:
    
      <entity, attribute, value>
    
  
  database as an immutable value
  completely in-memory
  written in cljs, js bindings available

how does it work?

(defn create-conn [schema]
  (atom (empty-db schema)
        :meta {:listeners (atom {})}))

(defrecord DB
    [schema
     eavt aevt avet ; three indices
     max-eid max-t])

(defn with [db datoms]
  (-> db
      (update-in [:eavt into datoms])
      (update-in [:aevt into datoms])))

(defn transact [conn datoms]
  (swap! ;...
   ))

  impl: BTSet: B+ tree (sorted set)
  perf comparable with sorted-set:
    
      slower conj, but faster iterate
    
  
  binary search lookups
  fast range scans
  reverse iteration
  fast first-time creation
  clojurescript, but heavy use of js arrays and APIs
  datascript is lightweight
    
      700 loc btset
      550 loc query engine
      1700 loc total
      1200 loc tests
    
  
  unfortunate to be associated with the word “database”
    
      no networking
      query over memory
    
  
how to use it


  every app has an ad-hoc state
  put everything in a database instead
  non-trivial SPA has complex state
  KV stores do not cut it
  no inherent hierarchy
    
      natural for any data: sparse, irregular, hierarchical, graph
    
  
  faster data retrieval from big datasets
  server sync
  undo/redo
  local caching
  audit

example architecture


  datascript + react
  db update causes full re-render, top-down
    
      immutability makes it fast
    
  
cleanup


  datomic never removes anything
  datascript will clean up old values, to ensure storage is bound

example


  acha-acha.co

whole db prefetch:


  no server fetch on any navigation, all queries and aggregations
    happen on client

project status


  considered alpha

Ephemeral-first data structures, Michał Marczyk


  ephemeral? transient?
  persistent-first, but not always?

basic concepts


  Persistent data structures
    
      immutable
      permit update-like operations
        
          with at-most logarithmic slowdown when compared with mutable
            counterpart
        
      
      “fully persistent data structures”
        
          full branching history remains available
        
      
  transients
    
      mutable, but share structure with persistent data structures
      enforce (<1.7) / demand (>= 1.7?) thread isolation
        
          change in 1.7 removes the enforcement
          still requires a single thread to “own”
          can hand-off via safe publication, so long as single
            ownership remains
        
      
      can be frozen to create a persistent DS
    
  
  ephemeral data structures: another term for mutable data
    structures
  implementation strategies
    
      trees with path copying for efficient PDS ‘updates’
      a notion of ownership for subtrees to support transients
    
  
brief history


  Bagwell (2000), Ideal Hash Trees
    
      mutable Hash Array Mapped Tries
        
          contribution: traditional hash tables have an occasional
            expensive resizing operation
          the unlucky insertion has to wait while the whole hash table
            is copied
          HAMTs avoids this by using a tree structure
          demonstrate comparable performance in best case, while
            avoiding whole-table copy resize ops
        
      
  Hickey (2007), Clojure
    
      persistent hash maps based on HAMTs with path copying
      also vectors based on a modification of the idea behind HAMTs
    
  
  Hickey (2009), Clojure
    
      transient vectors
      transient maps added later by @cgrand
      c.c/transient originally called mutable, persistent! -
        immutable!
        
          thread isolation introduced later
        
      
  Prokopec, Bronson, Bagwell, Odesky (2011)
    
      Cache-Aware Lock-Free Concurrent Hash Tries
      Concurrent Tries with Efficient Non-Blocking Snapshots
      Ctries - lock-free mutable HAMTs with O(1) snapshots
      the snapshots are first-class Ctries independent of originals
    
  
closer look at transients


  key notion: subtree ownership
  distinguished central location stores an ‘edit’ marker
    
      an AtomicReference
    
  
  and so do individual nodes
  the transient instance itself owns nodes with its edit marker
  other nodes might be shared
  updates check edit markers, mutate own nodes in place, copy paths
    otherwise
  new paths get new edit markers, subsequent updates will be in-place
  persistent! invalidates the transient by resetting its edit
    marker to nil

persistent-first approach in a mutable settings


  data.avl vs JDK’s NavigableMaps
    
      NavigableMap: nearest neighbour queries and subsetting
        
          two impls: java util TreeMap & java util SkipListMap
        
      
      JDK impls return subcollections as views
      modifications to the original reflected in the view (and vice
        versa)
      cannot add items to views outside original bounds
      inappropriate when the subcoll is to be passed on for arbitrary
        use
    
  
  however, BSTs (including red-black and AVL trees) support
    join/split/range
  join: merge/concat for pairs of trees with an ordering
    precondition
  split: produce subcolls including keys < and > a given key
  range: extract subcoll bounded by limits
  all in logarithmic time
  in mutable setting, these modify original tree
  intention of NavigableMap is different
    
      that it leaves original untouched
      TreeMap is forced into view-based impl
    
  
  AVL doesn’t have this problem
    
      return new first-class NavigableMaps, leave original untouched
      better than Java’s own NavigableMaps in this respect
    
  
  …unless we add transients
    
      a transient can be stored in a j.u.TreeMap-workalike wrapper
      an independent snapshot can be produced by invalidating the
        transient
      the original wrapper is free to install a new edit marker
        immediately
      the snapshot can be subsetted to produce first-class subcolls
      these can be safely returned
      cost: must store edit markers, a little slower for updates
      no benefit for SortedMap, only NavigableMap
    
  
ctries - overview


  map data structure with support for concurrent updates
  comparable to regular HAMTs in perf of HAMT ops
  snapshotable in O(1) time with little degradation to perf of
    other ops
  snapshots are completely independent first-class ctries
  now impl in pure clojure - ctries.clj (not on github yet)

internals


  ctries: internally similar to clj PersitentHashMap
    
      though different range of node types
    
  
  key structure difference:
    
      indirection nodes
      ensure linearizability of history at a slight perf cost
    
  
  CAS-based in-place updates to tree structure
  “generation” marker used to determine subtree ownership
    
      with transients, establishing ownership happens at every level
        of the tree
      with ctries, can skip levels
      would benefit from a generalized CAS operation
        
          GCAS: original contribution of second ctries paper
          RDCSS: Harris, Fraser, Pratt (2002), A Practical Multi-Word
            Compare-and-Swap Operation
          based on descriptor objects stored at the modified location
          GCAS only creates descriptors upon failure
        
      
blending in with Clojure PDSs


  at first glacne, ctries could support persistent and transient
    APIs simultaneously
  persistent ops would take snapshot and modify snapshot
    
      no go because transients reuse method names from the
        persistent API
    
  
  instead, ctries.clj maps are “transient first”
    
      persistent! creates immutable snapshots that behave like
        persistent maps
      deref creates mutable snapshots in the form of independent
        ctries
      deref also works on immutable snapshots (returns mutable
        independent ctries)
    
  
  persistent ctrie-based maps to be optimized further

Martin Trojer


  https://github.com/martintrojer/devbox

Rob Rees, Trojan horsing clojure with js


  Wisp, a lisp for js
  clojurescript as npm modules
    
      mori
        
          compiled clojurescript
          prefix notation
          includes quite a few clojure standard library functions
          clojure data structures
        
      
      immutablejs
        
          javascript
          prototype extension
        
      
  om
    
      love your library, not your language
      had a huge influence on javascript single-page apps
        
          clearly faster
          clearly fewer bugs
        
      
  omniscient
    
      javascript clone of om
        
          immutablejs/react
          immstruct
            
              implement cursors from om
            
          
-
js developers don’t want to get into clojure


  confident community
  es6
  libraries > framework
  http://slides.com/rrees/trojan-horsing-clojure-with-js

-
Q&A


  do js people really understand the Om influence on the way
    single-page apps are going?
    
      two-way data binding is dying
    
  
  is wisp ready for prod?
    
      you know what you’re shipping
      you’ll have to test the js that comes out of it
    
  
Adrian Mowat, My Componentised Clojure Crusade


  Glasgow-based, Arnold Clark
  stuart sierra component workflow
    
      juxt/jig – superceded by stuart sierra’s work this year
    
  
James Henderson, Flow: learnings from writing a clojurescript DSL


  github: james-henderson
  twitter: jarohen

of models and widgets

               +------------+
   listens     |  Atom      |  updates  
  /------------|            |<--------\ 
  v            +------------+         | 
Widget                              Model
  |            +------------+         ^ 
  \----------->|  Channel   |---------/ 
   events      +------------+  reacts to

a typical model


  core.async

after ClojureX 2013


  lots of boilerplate
  could a library help here?
  led to clidget
    
      ensured DOM was updated every time atom changed
      evolved into Flow
    
  
  2 weeks later, Om was announced
  why would any sane person carry on with Flow?!
    
      couldn’t shake the feeling that things could be simpler
      Om & Reagent introduce a lot of new concepts
    
  
Flow aims


  100% declarative
  minimise number of new concepts
  perform “well enough” to be useful (but don’t compete with React)
  https://github.com/james-henderson/flow
  (also, facebook created a thing called “flow” too. not the same)

hello world

(:require [flow.core :as f :include-macros true])

(defn hello-world []
  (f/root js/document.body
    (f/el
     [:p.message {::f/style {:font-weight :bold
                             :color "#a00"}}
      "Hello world!"])))

;; dynamic
(defn counter-component [!counter]
  (f/el
   [:p "value is " (<< !counter)]))

  the << operator continuously reads from an atom

subcompenents


  important for composable, maintainable code
  the other operator !<<

steps for writing a compiler


  input
  lexing
  parsing
  code synthesis
  output

lexer/parser


  read-string

synthesis - translating the AST


  notions:

DynamicElement :: AppState -> (Element, DynamicElement)

DynamicValue :: AppState -> Value
looking at if

(if <DynamicValue>
  <DynamicElement>
  <DynamicElement>)

(defmethod fc/compile-el-form :if [[_ test then else] opts]
  `(build-if (fn [] ~(fc/compile-value-form test opts))
             (fn [] ~(fc/compile-el-form then opts))
             (fn [] ~(fc/compile-el-form else opts))))
rules of macro club


  don’t write macros
  don’t write macros!
  (unless you know you have to)
  Get out of macro-lang as soon as you can
    
      changing the execution order?
      analysing a form?
    
  
if, cont’d


  what’s in build-if?
    
      cached state (of the current value of the test)
    
  
-
aside: clojurescript is getting quicker


  the bottleneck is becoming me, rather than the clojurescript
    compiler

what’s next for flow?


  stable release
  more tutorials, examples, docs
  next version: ???

what can you do now?


  feedback please!
  give it a go!
  get involved!

Tom Coupland: Supercharging Cyanite


  @tcoupland
  MixRadio
  graphite
    
      graphical frontend for viewing metrics
    
  
  carbon
    
      default data store for graphite
      (technically, whisper, but no matter)
      flat files on disk
      round-robin rollup
    
  
  cyanite from @pyr
    
      elasticsearch/cassandra backend for graphite, replacement for
        whisper
    
  
  core.async primer
    
      channels, take/put, go blocks
        
          thread pool executors
        
      
measurements


  YourKit agent added to cyanite
  start CPU profiling
  culprit: add-path
    
      doing blocking I/O inside go-loop
      don’t block the thread pool
      Single Responsibility Principle
    
  
fix: change topology


  changed topology to read from one input, write to two separate
    output channels (elasticsearch + cassandra)
  but… performs even worse!
    
      less CPU usage, but less metrics/s and less network usage
    
  
fix: extra go blocks to handle communicating with elasticsearch & casandra


  faster, but stil not great
  600 metrics/s
  50k/s traffic in, 160k/s traffic out
  deserves a cup of tea :)
  eventually we find a problem:
    
      AssertionError: No more than 1024 pending puts are allowed on a
        single channel
      channel has in-built buffer
      but there’s an extra “buffer” of pending-puts (and takes)
      which has a hard limit of 1024
      we hit it
      helpfully, the error message suggests “windowed-buffer”
        
          which drops work on the floor
          nope nope nope!
        
      
  (@aphyr from the twitter gallery: LITTLES LAW)

batching


  core.async partition
    
      (<! (partition 1000 chan))
      consumes messages in batches of 1000
    
  
SRP


  add-path fn is doing too much
    
      split into separate CSP processes
      check full path
      break up
      insert
    
  
  now:
    
      2500 metrics/s
      network
        
          400 k/s in
          1500 k/s out
        
      
can we do better?


  are we stressing cyanite enough to see problems?
  soak testing
    
      generate vast quantities of data to stress cyanite
    
  
  use yourkit profiling
    
      observe most of what we’re doing is waiting for elasticsearch
      current elasticsearch client uses clj-http
      others could use httpkit or similar (which use java nio)
    
  
The Joy of Open Source (improving elasticsearch client)


  being open source really pays off here
  we can just look inside the elasticsearch driver
  we can copy-paste it and swap out another http client

what else can we see?


  qbits.hayt.cql – cassandra query builder
  half the system’s time is spent on clojure.string/join
    
      this isn’t so good
    
  
  I’m using the library wrong
    
      it’s for building nice queries
      I’m just doing inserts, I don’t care about queries
      should just do batching myself
    
  
caching


  now
    
      CPU usage halved to 16% (from 30%)
      memory usage down
      2500 metrics/s, same as before
      network
        
          in: 400k/s
          out: 880k/s (down from 1500k/s, due to batching)
        
      
back to YourKit


  all our time is in the format_processor
  this looks good

Back pressure

wrap up


  Single Responsibility Principle
    
      works at all the abstraction levels
      it’s always a good idea
        
          except when it isn’t, like all good ideas
        
      
  open source is brilliant
    
      but RTFM
        
          if their motivation doesn’t align with yours, it might not
            help you
        
      
  core.async
    
      did it deliver on its promise?
      it’s a nice way of breaking things down and following SRP
      check the JIRA before betting the business on it
    
  
  performance tuning
    
      it’s brilliant, do it
      go use YourKit
    
  
@matthiasn, BirdWatch


  BirdWatch, a twitter processing thingy
  joy of clojure
  react
  Om
  transducers
    
      listened to rich’s talk several times
        
          each time learned something new
        
      
  A farewell note to a programming language (namely scala)

the technical part


  twitter streaming API
  clojure stream client
  percolation queries for elasticsearch
    
      sends data only to those clients who are interested via
        websockets
    
  
inspection


  logging, pprinting
    
      bah
      tail -F didn’t work, wanted multiline
    
  
  log to core.async channels with dropping-buffers
    
      can inspect latest state by reading from them
    
  
Thomas Kristiensen, More Open Source Systems, Please


  composition
    
      micro -> macro:
        
          immutability, pure fns, idiomatic clojure
          core.async, component
          prod
        
      
      but we spend more of our time at the high level – where are our
        composition tools here?
    
  
composition experience reports


  wish-list for experience report:
    
      context
        
          I want to know why I need this thing
        
      
      limitations in the group
      values in the group
      alternatives
        
          and why didn’t you choose them?
        
      
      high-level view of component
        
          before the details are presented. won’t under the details
            without this
        
      
      interesting details
    
  
example:bifrost


  goal: archive data from kafka to S3, without any hadoop dependencies
  main app:
    
      nginx
      ruby webapp
      comparison service + mongo
      collector + MySQL
    
  
  all logs sent to kafka
  kafka keeps a sliding window, keeping latest n messages
  behind kafka:
    
      a loader
      sends data to buckets
      “blueshift” - uswitch tool that sends stuff to aws redshift
    
  
limitations


  time
  S3
  part of larger piece of work
  streaming (no longer batching)

alternatives


  secor – has a dependency on hadoop

link


  https://github.com/uswitch/bifrost

let’s do this more!


  30 minute slots – please contact me!
  first meetup jan 2015

Chris Ford, “journey through the looking glass”


  lenses! <3

clojure/core has:


  update-in
  separates the focus of context from the applied function
  the same paths work with get-in and assoc-in
  however, update-in is specialised to one kind of focus

functors


  “functors are structure-preserving maps between categories”
    
      Barr and Wells
    
  
  functors are functions that lift functions into a context.
  functors compose as functions

examples


  sequence functor
    
      (defn fsequence [f] (partial map f))
    
  
  identity functor
    
      (defn fidentity [f] f)
    
  
  constant functor
    
      (defn fconst [_] identity)
    
  
  “in” functor
    
      (defn fin [k f] (fn [x] (update-in x [k] f)))
      ((fin :x inc) {:x 1 :y 1}) => {:x 2 :y 1}
    
  
lenses


  Lenses are functions that lift contextualising functions into a
    context.
  update, put, view can all be represented by one function.
  lenses compose as functions.
  I don’t think my notes can do this talk justice. it’s really
    good, watch the video :)

traversals


  lenses that can have more than one target
  the update part works
  but view needs variadic functors (~ Applicatives) and Monoid
    targets
    
      need a strategy for combining multiple values into one
      problem in clojure: monoid zero value doesn’t know the context
        it’s in
      so how do you deal with traversals with zero targets?
    
  
traversy


  inspired by kmett’s haskell lenses
    
      but uses building blocks from clojure/core
      and recognizes limitations of lack of contextual type
        information
    
  
  a lens in this world is a pair of fns: focus and fmap
  in this world, the traversal laws are more what you’d call
    guidelines…
  https://github.com/ctford/traversy

q:


  what are your thoughts on teh different worlds of haskell and
    clojure?
    
      the social difference is most clear
      clojure is v friendly
      haskell is friendly too, but a bit of a rough exterior
        sometimes
    
  
kyle kingsbury, keynote


  I work at factual
  I do a lot of database stuff

what is it like to be a fold?


  hugs form a monoid!
  ->>, map, filter, reduce

(->> coll
     (map inc)
     (filter odd?)
     (reduce +))

  if you create intermediate sequences for these seqs, you put a
    lot of pressure on the GC
  catamorphisms!
  reducers eliminate the intermediate seqs
  haskell calls this “stream fusion”
  a specific case of deforestation - a fp optimization technique
  reducers are defined for each coll type
  trancducers have no underlying type (to confuse haskell people)
  (mapping inc) – maps over anything
  but still sequential

associativity


  if f is associative, we can fold in either direction
    
      you can have an arborescent fold (tree-like)
      can more easily parallelize
      then bring together the resultant results
    
  
  this is already in reducers
  because Rich invented everything and all we have to do is
    discover it in the core library
    
      eg PersistentQueue - who knew?
    
  
  results preserve order
    
      order requires coordination
      coordination might be hard
    
  
commutativity


  if (= (f a b) (f b a)) we can do stuff in any order we like
  monoids compose wonderfully
  commutative monoids compose even better
    
      they’re great for distributed systems
      see also CRDTs (which are commutative monoids with idempotence)
    
  
tesser


  parallel
  unordered
  stream fusion
  collection independent
  can do this in hadoop
    
      using stateful mappers
      can do reductions in the mappers
    
  
  post-reduce phase
    
      reduce over transient, or native array
      this phase can seal it
    
  
  post-combine phase
    
      similar, after combining
    
  
  tesser uses a map!
    
      rather than multiple arity fns
    
  
{:reduce ...
 :reduce-identity ...
 :combine ...
 :combine-identity...
 ... ...}

  unlike transducers, we defer the build phase til the end
    
      so we can compose inside & out
      allows seq API
    
  
constraints


  identity fns must be pure
    
      transducers doesn’t use pure reduction fns
        
          take sets up an atom for no of elements
          increments each time
          doesn’t have to wrap the accumulator value
        
      
      no-go for tesser
      we need all the state in the accumulator
        
          so we can pass it back and forth over network
        
      
  reducers/combiners must be associative
    
      and also commutative
    
  
  algebird isn’t commutative. should we order?
  must short-circuit (reduced x) values

can be used for


  stats!
    
      mean, variance, covariance, correlation matrices