Created
December 10, 2010 10:10
-
-
Save razamatan/736048 to your computer and use it in GitHub Desktop.
randomize python iterator
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from random import shuffle, randrange | |
def randomize(iterable, bufsize=1000): | |
''' generator that randomizes an iterable. space: O(bufsize). time: O(n). ''' | |
buf = list() | |
for x in iterable: | |
if len(buf) == bufsize: | |
i = randrange(bufsize) | |
yield buf[i] | |
buf[i] = x | |
else: | |
buf.append(x) | |
shuffle(buf) | |
while buf: yield buf.pop() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
didn't like the first version since it could end up deallocating the buffer in the nested buf.pop() depending on the underlying implementation. this version should prevent that behavior and would always have the buf be bufsize if the iterable could fill it up.
i had a version that would preallocate buf = [None] * bufsize and thereby avoid doing the shuffle/pop business at the end:
however, i didn't like how the preallocation behaved since the replacing for loop at the end ends up costing a lot for large buffer sizes, esp. if the iterable was smaller than the buffer. also had memory issues if you deliberately gave it a large bufsize. also, heaven forbid if you had Nones in the iterable and actually wanted them returned. :)
it's important to point out that the bufsize is the measure of entropy in the randomization. if the bufsize is really small, the results won't look random at all.