Skip to content

Instantly share code, notes, and snippets.

@kaja47
Created May 9, 2014 19:47
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save kaja47/554f62c61f21b0420720 to your computer and use it in GitHub Desktop.
Save kaja47/554f62c61f21b0420720 to your computer and use it in GitHub Desktop.
minhash vs. HyperLogLog
// min-hash
val fs: Vector[Int => Int] // hash funkce
items map { it => fs map { f => f(it) } } fold (vectorPairwise(min), initialValue = Vector.fill(infinity))
// HyperLogLog
val fs: Vector[Int => Int] = 0 until 16384 map { i => { it => val (prefix, suffix) = f(it); if (prefix == i) positionOfFirstOneBit(suffix) else 0 } }
items map { it => fs map { f => f(it) } } fold (vectorPairwise(max), initialValue = Vector.fill(0))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment