Skip to content

Instantly share code, notes, and snippets.

@razamatan
Created December 10, 2010 10:10
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save razamatan/736048 to your computer and use it in GitHub Desktop.
Save razamatan/736048 to your computer and use it in GitHub Desktop.
randomize python iterator
from random import shuffle, randrange
def randomize(iterable, bufsize=1000):
''' generator that randomizes an iterable. space: O(bufsize). time: O(n). '''
buf = list()
for x in iterable:
if len(buf) == bufsize:
i = randrange(bufsize)
yield buf[i]
buf[i] = x
else:
buf.append(x)
shuffle(buf)
while buf: yield buf.pop()
@razamatan
Copy link
Author

didn't like the first version since it could end up deallocating the buffer in the nested buf.pop() depending on the underlying implementation. this version should prevent that behavior and would always have the buf be bufsize if the iterable could fill it up.

i had a version that would preallocate buf = [None] * bufsize and thereby avoid doing the shuffle/pop business at the end:

from random import randrange

def randomize(iterable, bufsize=1000):
   ''' generator that randomizes an iterable. space: O(bufsize). time: O(n+bufsize). '''
   buf = [None] * bufsize
   for x in iterable:
      i = randrange(bufsize)
      if buf[i] is not None: yield buf[i]
      buf[i] = x
   for x in buf:
      if x is not None: yield x

however, i didn't like how the preallocation behaved since the replacing for loop at the end ends up costing a lot for large buffer sizes, esp. if the iterable was smaller than the buffer. also had memory issues if you deliberately gave it a large bufsize. also, heaven forbid if you had Nones in the iterable and actually wanted them returned. :)

it's important to point out that the bufsize is the measure of entropy in the randomization. if the bufsize is really small, the results won't look random at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment