Avi Bryant avibryant

## gist:f158a2e4abe977dc15f3
case class SetSizeAggregator(hllBits: Int, maxSetSize: Int = 10)
    extends MonoidAggregator[Array[Byte], Either[HLL, Set[Array[Byte]]], Long] {

  def prepare(in: Array[Byte]) = Right(Set(in))
  def present(sum: Either[HLL, Set[Array[Byte]]]) = {
    sum match {
      case Right(set) => set.size
      case Left(hll) => hll.approximateSize.estimate
    }
  }

## gist:b43d3db8933556001285
import com.twitter.algebird._

case class Preparer[A, T](prepareFn: A => T) {
  def map[U](fn: T => U) =
    Preparer[A, U](fn.compose(prepareFn))

  def flatMap[U](fn: T => TraversableOnce[U]) =
    FlatPreparer[A, U](fn.compose(prepareFn))

  def aggregate[B, C](aggregator: Aggregator[T, B, C]): Aggregator[A, B, C] =

## bayesab.js
function probabilityBBeatsA(aa, ba, ab, bb) {
  var probability = 0.0;

  for(var i = 0; i < ab; i++) {
    var product = Math.log(1 + (aa + ba)/(i + bb));
    var j = 1;
    var start = i+1;

    [bb, ba, aa, i].forEach(function(steps){
      var stop = start + steps;

## gist:11376572
GFS = HDFS
MapReduce = Hadoop
BigTable = HBase
Protocol Buffers = Thrift or Avro (serialization)
Stubby = Thrift or Avro (RPC)
ColumnIO = Parquet
Dremel = Impala
Chubby = Zookeeper
Omega = Mesos
Borg = Aurora

## keybase.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                avibryant
                / keybase.md
            
            
              Created
              March 17, 2014 20:55
            
              
                keybase.md
              
          
    Keybase proof

I hereby claim:

I am avibryant on github.
I am avi (https://keybase.io/avi) on keybase.
I have a public key whose fingerprint is 9DFC 19D0 8558 8DE8 6BF3  5C19 3879 4907 3690 98AF

To claim this, I am signing this object:

  
## impala-hll.md

      
              1 file
            
          
              1 fork
            
          
              2 comments
            
          
              14 stars
            
          
                avibryant
                / impala-hll.md
            
            
              Last active
              June 22, 2020 06:39
            
          
    Recent versions of Cloudera's Impala added NDV, a "number of distinct values" aggregate function that uses the HyperLogLog algorithm to estimate this number, in parallel, in a fixed amount of space.
This can make a really, really big difference: in a large table I tested this on, which had roughly 100M unique values of mycolumn, using NDV(mycolumn) got me an approximate answer in 27 seconds, whereas the exact answer using count(distinct mycolumn) took ... well, I don't know how long, because I got tired of waiting for it after 45 minutes.
It's fun to note, though, that because of another recent addition to Impala's dialect of SQL, the fnv_hash function, you don't actually need to use NDV; instead, you can build HyperLogLog yourself from mathematical primitives.
HyperLogLog hashes each value it sees, and then assigns them to a bucket based on the low order bits of the hash. It's common to use 1024 buckets, so we can get the bucket by using a bitwise & with 1023:
select

  
## index.html
<html>
<head>
<link rel="stylesheet" href="http://code.jquery.com/ui/1.10.3/themes/smoothness/jquery-ui.css" />
<script src="http://code.jquery.com/jquery-1.9.1.js"></script>
<script src="http://code.jquery.com/ui/1.10.3/jquery-ui.js"></script>
<link rel="stylesheet" href="/resources/demos/style.css" />
<script>

$(function() {
  $( "#datepicker" ).datepicker(

## gist:6575113
scala> val x = null.asInstanceOf[Int]
x: Int = 0

scala> Option(x)
res1: Option[Int] = Some(0)

scala> Option(null.asInstanceOf[Int])
res2: Option[Int] = None

## monad.rb
require 'continuation'

module Monad
  def -@
    MonadContext.bind(self)
  end

  def wrap(val)
    raise NotImplementedError
  end

## approx-redis.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                avibryant
                / approx-redis.md
            
            
              Last active
              December 16, 2015 14:29
            
              
                Approximate collection commands for Redis. Note: I haven't implemented these; I don't have immediate plans to implement these; I don't even know what the right way would be (fork redis? a bunch of Lua scripts? A proxy?); but it was an interesting thing to think about.

              
    Approximate set

Unless noted otherwise, these do the same thing their equivalent set commands do. However, only a small fixed amount of memory is used for each set, no matter how many members you add.

ASADD key member [member ...]
ASCARD key

this will be correct within some small error


ASINTERCARD key [key ...]

this is equivalent to a SINTERSTORE followed by a SCARD, but we can't actually store it


ASISMEMBER key
	case class SetSizeAggregator(hllBits: Int, maxSetSize: Int = 10)
	extends MonoidAggregator[Array[Byte], Either[HLL, Set[Array[Byte]]], Long] {

	def prepare(in: Array[Byte]) = Right(Set(in))
	def present(sum: Either[HLL, Set[Array[Byte]]]) = {
	sum match {
	case Right(set) => set.size
	case Left(hll) => hll.approximateSize.estimate
	}
	}
	import com.twitter.algebird._

	case class Preparer[A, T](prepareFn: A => T) {
	def map[U](fn: T => U) =
	Preparer[A, U](fn.compose(prepareFn))

	def flatMap[U](fn: T => TraversableOnce[U]) =
	FlatPreparer[A, U](fn.compose(prepareFn))

	def aggregate[B, C](aggregator: Aggregator[T, B, C]): Aggregator[A, B, C] =
	function probabilityBBeatsA(aa, ba, ab, bb) {
	var probability = 0.0;

	for(var i = 0; i < ab; i++) {
	var product = Math.log(1 + (aa + ba)/(i + bb));
	var j = 1;
	var start = i+1;

	[bb, ba, aa, i].forEach(function(steps){
	var stop = start + steps;
	GFS = HDFS
	MapReduce = Hadoop
	BigTable = HBase
	Protocol Buffers = Thrift or Avro (serialization)
	Stubby = Thrift or Avro (RPC)
	ColumnIO = Parquet
	Dremel = Impala
	Chubby = Zookeeper
	Omega = Mesos
	Borg = Aurora
	<html>
	<head>
	<link rel="stylesheet" href="http://code.jquery.com/ui/1.10.3/themes/smoothness/jquery-ui.css" />
	<script src="http://code.jquery.com/jquery-1.9.1.js"></script>
	<script src="http://code.jquery.com/ui/1.10.3/jquery-ui.js"></script>
	<link rel="stylesheet" href="/resources/demos/style.css" />
	<script>

	$(function() {
	$( "#datepicker" ).datepicker(
	scala> val x = null.asInstanceOf[Int]
	x: Int = 0

	scala> Option(x)
	res1: Option[Int] = Some(0)

	scala> Option(null.asInstanceOf[Int])
	res2: Option[Int] = None
	require 'continuation'

	module Monad
	def -@
	MonadContext.bind(self)
	end

	def wrap(val)
	raise NotImplementedError
	end