Last active
December 12, 2015 09:39
-
-
Save nastra/4753728 to your computer and use it in GitHub Desktop.
Implements the random reservoir sampling algorithm of Vitter. Reservoir Sampling is an algorithm designed to take K samples from a stream of samples of unknown size in linear time. This is a solution to the following interview question:
"You have a stream of infinite queries (ie: real time Google search queries that people are entering). Describ…
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from random import randint | |
class Sampler: | |
def __init__(self, maxSamples=30): # maxSamples set to 30 by default for better console output | |
self.samples = [] | |
self.counter = 0 | |
self.maxSamples = maxSamples | |
def sample(self, sample): | |
if(self.counter < self.maxSamples): | |
self.samples.append(sample) | |
else: | |
index = randint(0, self.counter) | |
if(index < self.maxSamples): | |
self.samples[index] = sample | |
self.counter = self.counter + 1 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment