Skip to content

Instantly share code, notes, and snippets.

@nastra
Last active December 12, 2015 09:39
Show Gist options
  • Save nastra/4753728 to your computer and use it in GitHub Desktop.
Save nastra/4753728 to your computer and use it in GitHub Desktop.
Implements the random reservoir sampling algorithm of Vitter. Reservoir Sampling is an algorithm designed to take K samples from a stream of samples of unknown size in linear time. This is a solution to the following interview question: "You have a stream of infinite queries (ie: real time Google search queries that people are entering). Describ…
from random import randint
class Sampler:
def __init__(self, maxSamples=30): # maxSamples set to 30 by default for better console output
self.samples = []
self.counter = 0
self.maxSamples = maxSamples
def sample(self, sample):
if(self.counter < self.maxSamples):
self.samples.append(sample)
else:
index = randint(0, self.counter)
if(index < self.maxSamples):
self.samples[index] = sample
self.counter = self.counter + 1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment