Last active
November 20, 2016 22:16
-
-
Save erdos/4fbe8fe9a8ad17abf3f2a9ccde498cb0 to your computer and use it in GitHub Desktop.
split-by-key returning groups may not cover whole input
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(do | |
;; number of groups | |
(def n (.availableProcessors (Runtime/getRuntime))) | |
;; memoized round-robin style group number generator for items | |
(let [a (atom (cycle (range n)))] | |
(def get-id-for | |
(memoize (fn [_] (first (swap! a next)))))) | |
(defn split-by-key | |
"Distributes all items in n sequences according to f(x)" | |
[f items] | |
(for [id (range n)] | |
(filter (comp #{id} get-id-for f) items))) | |
;; example: (split-by-key int (range 16)) => ((7 15) (0 8) (1 9) (2 10) (3 11) (4 12) (5 13) (6 14)) | |
;; i am using (future) to realize the lazy groups returned by #'split-by-key | |
(let [total 10000 | |
xs (range total) | |
gs (split-by-key int xs)] | |
(doseq [g gs] | |
;; do work here. | |
(future (reduce + g))) | |
;; i expect the sum to be = total, but it is not constant. | |
;; that is because an item in xs may go to multiple groups. | |
(reduce + (map count gs))) | |
) |
Solution: The bug is in line 8 of the original code. The point is that memoize
should never have side effects. Memoization internally is implemented using atom
and swap
(a compare-and-swap operation).
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Replacing
split-by-key
with the following seems to fix the problem:Of course it does not create
n
groups.My question is: why is the previous behaviour possible?