Skip to content

Instantly share code, notes, and snippets.

@tcoupland
Created October 14, 2015 15:30
Show Gist options
  • Save tcoupland/184087e37c59ad881c6e to your computer and use it in GitHub Desktop.
Save tcoupland/184087e37c59ad881c6e to your computer and use it in GitHub Desktop.
Cascalog data binning code for making histograms
(defn one [& _] 1)
(defparallelagg counter
:init-var one
:combine-var +)
(defn mn
[data]
(<- [?mn]
(data :> ?v)
(o/min ?v :> ?mn)))
(defn mx
[data]
(<- [?mx]
(data :> ?v)
(o/max ?v :> ?mx)))
(defn min-max
[data]
(??-
(mn data)
(mx data)))
(def num-bins 50)
(defmapfn add-bin
[mn range d]
[(-> d
(- mn)
(/ range)
(* num-bins)
(int)
(min (dec num-bins)))])
(defn bin
[data]
(let [[[[mn]] [[mx]]] (min-max data)
range (- mx mn)]
(<- [?bin ?count]
(data :> ?d)
(add-bin mn range ?d :> ?bin)
(counter ?bin :> ?count))))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment