Skip to content

Instantly share code, notes, and snippets.

@erdos
Last active November 20, 2016 22:16
Show Gist options
  • Save erdos/4fbe8fe9a8ad17abf3f2a9ccde498cb0 to your computer and use it in GitHub Desktop.
Save erdos/4fbe8fe9a8ad17abf3f2a9ccde498cb0 to your computer and use it in GitHub Desktop.
split-by-key returning groups may not cover whole input
(do
;; number of groups
(def n (.availableProcessors (Runtime/getRuntime)))
;; memoized round-robin style group number generator for items
(let [a (atom (cycle (range n)))]
(def get-id-for
(memoize (fn [_] (first (swap! a next))))))
(defn split-by-key
"Distributes all items in n sequences according to f(x)"
[f items]
(for [id (range n)]
(filter (comp #{id} get-id-for f) items)))
;; example: (split-by-key int (range 16)) => ((7 15) (0 8) (1 9) (2 10) (3 11) (4 12) (5 13) (6 14))
;; i am using (future) to realize the lazy groups returned by #'split-by-key
(let [total 10000
xs (range total)
gs (split-by-key int xs)]
(doseq [g gs]
;; do work here.
(future (reduce + g)))
;; i expect the sum to be = total, but it is not constant.
;; that is because an item in xs may go to multiple groups.
(reduce + (map count gs)))
)
@erdos
Copy link
Author

erdos commented Apr 21, 2016

Replacing split-by-key with the following seems to fix the problem:

(defn split-by-key [f items]
    (vals (group-by (comp get-id-for f) items)))

Of course it does not create n groups.

My question is: why is the previous behaviour possible?

@erdos
Copy link
Author

erdos commented Nov 20, 2016

Solution: The bug is in line 8 of the original code. The point is that memoize should never have side effects. Memoization internally is implemented using atom and swap (a compare-and-swap operation).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment