Skip to content

Instantly share code, notes, and snippets.

@ryanwitt
Created November 15, 2012 16:54
Show Gist options
  • Save ryanwitt/4079732 to your computer and use it in GitHub Desktop.
Save ryanwitt/4079732 to your computer and use it in GitHub Desktop.
Can you create an unbiased sample of size k from a large stream in constant memory?
import random
import matplotlib.pyplot as plt
k = 1000
array = []
for n, x in enumerate([range(k)[random.randrange(k)] for x in range(100000)]):
if n < k:
array.append(x)
else:
if random.random() < k/float(n):
array[random.randrange(k)] = x
plt.hist(array)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment