Skip to content

Instantly share code, notes, and snippets.

@pkukielka
Last active December 30, 2015 02:19
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pkukielka/7762102 to your computer and use it in GitHub Desktop.
Save pkukielka/7762102 to your computer and use it in GitHub Desktop.
Simple scala Bloom filter. Uses MurmurHash3 to compute hashes.
import scala.collection.mutable.BitSet
import scala.util.hashing.{MurmurHash3 => MH3}
import scala.math.{abs, log}
class BloomFilter(n: Int, p: Double) {
val m = (-n * log(p) / 0.48).toInt + 1
val k = (0.7 * m / n).toInt
val buckets = new BitSet(m)
def add(elem: String) = hash(elem).forall(buckets += _)
def query(elem: String) = hash(elem).exists(index => buckets.contains(index))
private def hash(elem: String) = {
val h1 = MH3.stringHash(elem)
val h2 = MH3.stringHash(elem, h1)
for (i <- 0 until k) yield abs((h1 + i * h2) % m)
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment