Skip to content

Instantly share code, notes, and snippets.

@DavidRdgz
Last active February 11, 2018 17:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save DavidRdgz/5a08eab79c0224a170bb2138cb822b79 to your computer and use it in GitHub Desktop.
Save DavidRdgz/5a08eab79c0224a170bb2138cb822b79 to your computer and use it in GitHub Desktop.
Sampling an infinite stream with reservoir sampling
#!/usr/bin/python
import random
shuf = random.sample(range(100000), 1000)
# Keep bag of k=10 elements
# Prefill the bag with first shuf items
bag = shuf[:10]
for i in shuf[10:]:
k = random.randint(0, 9)
# Update bag in 1/k chance
if k == 1:
# Randomly kick out old bag item
indx = random.randint(0, 9)
bag[indx] = i
print bag
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment