Skip to content

Instantly share code, notes, and snippets.

@DmitryBe
Last active January 31, 2017 09:17
Show Gist options
  • Save DmitryBe/1296dc16b2fc4f67143f039e9745ade4 to your computer and use it in GitHub Desktop.
Save DmitryBe/1296dc16b2fc4f67143f039e9745ade4 to your computer and use it in GitHub Desktop.
Algebird Hyper Log Log
import com.twitter.algebird.HyperLogLogMonoid
//define test data
val data = Seq("aaa", "bbb", "ccc")
//create algebird HLL
val hll = new HyperLogLogMonoid(bits = 10)
//convert data elements to a seq of hlls
val hlls = data.map { str =>
val bytes = str.getBytes("utf-8")
hll.create(bytes)
}
//or:
val data = List(1, 1, 2, 2, 3, 3, 4, 4, 5, 5)
val hlls = data.map { hllMonoid.create(_) }
//WARN: don`t use merged.size - it is a different thing
//get the estimate count from merged hll
println("estimate count: " + hll.sizeOf(merged).estimate)
//or
println("estimate count: " + merged.approximateSize.estimate)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment