Skip to content

Instantly share code, notes, and snippets.

@nathanmarz
Created November 9, 2011 05:38
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nathanmarz/1350522 to your computer and use it in GitHub Desktop.
Save nathanmarz/1350522 to your computer and use it in GitHub Desktop.
Optimized variance in Cascalog
(defn one [] 1)
(defn div [v1 v2]
(float (/ v1 v2)))
(defparallelagg count
:init-var #'one
:combine-var #'+)
(defparallelagg sum-parallel
:init-var #'identity
:combine-var #'+)
(def sum (each sum-parallel))
(def avg
(<- [!val :> !avg]
(count !count)
(sum !val :> !sum)
(div !sum !count :> !avg)))
(def variance
(<- [!val :> !var]
(* !val !val :> !squared)
(sum !squared :> !square-sum)
(count !count)
(avg !val :> !mean)
(* !mean !mean :> !mean-squared)
(div !square-sum !count :> !i)
(- !i !mean-squared :> !var)))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment