Skip to content

Instantly share code, notes, and snippets.

@tantalor
Created July 2, 2013 17:08
Show Gist options
  • Save tantalor/5911137 to your computer and use it in GitHub Desktop.
Save tantalor/5911137 to your computer and use it in GitHub Desktop.
Testing whether reservoir sampling works when you sample the RNG only once. reservoir1: Samples RNG once per stream element. reservoir2: Samples the RNG once per stream.
from random import random as rand
def reservoir1(stream):
i = 1
choice = None
for value in stream:
if rand() * i < 1:
choice = value
i = i + 1
return choice
def reservoir2(stream):
r = rand()
i = 1
choice = None
for value in stream:
if r * i < 1:
choice = value
i = i + 1
return choice
def test(stream, reservoir, n=1000):
histo = {}
for k in xrange(0, n):
choice = reservoir(stream)
histo[choice] = histo.get(choice, 0) + 1
return histo
get_stream = lambda: xrange(0, 8)
print "reservoir1: ", test(get_stream(), reservoir1)
print "reservoir2: ", test(get_stream(), reservoir2)
@tantalor
Copy link
Author

tantalor commented Jul 2, 2013

Seems like reservoir2 does not give a uniform distribution,

reservoir1:  {0: 128, 1: 92, 2: 128, 3: 133, 4: 124, 5: 121, 6: 142, 7: 132}
reservoir2:  {0: 497, 1: 184, 2: 73, 3: 51, 4: 39, 5: 23, 6: 15, 7: 118}

@JonathanHourany
Copy link

JonathanHourany commented May 23, 2018

I'm sure it must be weird getting a comment on this repo 5 years after putting it up, but the reason reservoir2 is not uniform is because of a bug on line 14.

r is being assigned as some fixed floating point number at the start with r = rand() and then it never changes in the for-loop. Calling r at this point only returns the same number over and over. Deleting line 14 and changing line 18 to if rand() * i < 1: fixes the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment