Skip to content

Instantly share code, notes, and snippets.

View johnynek's full-sized avatar

P. Oscar Boykin johnynek

View GitHub Profile
You can swap with only two registers if they point to elements of a group:
a = a + b
b = a - b
a = a - b
proof:
a1 = a + b
b1 = a1 - b = a
a2 = -b1 + a1 = (-a) + a + b
@johnynek
johnynek / gist:8961994
Last active August 29, 2015 13:56
Some Questions with Sketch Monoids

Unifying Sketch Monoids

As I discussed in Algebra for Analytics, many sketch monoids, such as Bloom filters, HyperLogLog, and Count-min sketch, can be described as a hashing (projection) of items into a sparse space, then using two different commutative monoids to read and write respectively. Finally, the read monoids always have the property that (a + b) <= a, b and the write monoids has the property that (a + b) >= a, b.

##Some questions:

  1. Note how similar CMS and Bloom filters are. The difference: bloom hashes k times onto the same space, CMS hashes k times onto a k orthogonal subspaces. Why the difference? Imagine a fixed space bloom that hashes onto k orthogonal spaces, or an overlapping CMS that hashes onto k * m length space. How do the error asymptotics change?
  2. CMS has many query modes (dot product, etc...) can those generalize to other sketchs (HLL, Bloom)?
  3. What other sketch or non-sketch algorithms can be expressed in this dual mo
// Find the newest asserted at for each combo of subject & property
.groupBy('subject, 'property) {
_.sortedReverseTake[(String, Double)](('asserted_at, 'value) -> 'items , 1)
}
// Extract out that newest asserted at value
.flatten[(String, Double)](('items) -> ('asserted_at, 'value))
.discard('items)
@johnynek
johnynek / 0_reuse_code.js
Created April 2, 2014 22:58
Here are some things you can do with Gists in GistBox.
// Use Gists to store code you would like to remember later on
console.log(window); // log the "window" object to the console
@johnynek
johnynek / TypedDataCube.md
Last active August 29, 2015 14:04
How to do data cubing in typed scalding?

Suppose you have a key like (page, geo, day) and you want to make rollups/datacube so you can query for all pages, or all geos or all days.

Here is how you do it:

def opts[T](t: T): Seq[Option[T]] = Seq(Some(t), None)

val p: TypedPipe[(String, String, Int)] = ...

p.sumByLocalKeys
@johnynek
johnynek / pomonoids.md
Last active August 29, 2015 14:05
Some notes about partially ordered Monoids

This question on MathOverflow: http://mathoverflow.net/questions/179390/standard-name-for-a-monoid-semigroup-with-ab-a-b?noredirect=1#comment449777_179390

lead me to the definition of a partially ordered monoid, or pomonoid: http://books.google.com/books?id=ZO5Z-zZijDgC&pg=PA248&lpg=PA248&dq=pomonoid+definition&source=bl&ots=lX2PietcDd&sig=k79BcAh0s5Y8OGC12Kclmn4Tkgw&hl=en&sa=X&ei=1Mr8U6HGKYu6ogTL3oGYAQ&ved=0CCYQ6AEwATgK#v=onepage&q=pomonoid%20definition&f=false

Pomonoid: if x <= y, then zx <= zy for all x,y,z in X. Integral Pomonoid if x <= 1 for all x in X.

Lemma: If you have an Integral Pomonoid, then xy <= x. yx <= x

@johnynek
johnynek / abstract_join.scala
Last active August 29, 2015 14:10
Abstracting map/reduce joins.
/**
* @avibryant and I have been interested in extracting as much of scalding out into Algebird,
* so that it is portable across many execution systems, but how to model joins?
*
* In the FP world, Applicative[M] is a typeclass that gives you both Functor[M] (which provides map)
* and in addition join:
*/
trait Functor[M[_]] {
// law: map(map(a)(f))(g) == map(a)(f.andThen(g))
def map[V,U](init: M[T])(fn: T => U): M[U]
@johnynek
johnynek / bug.scala
Last active August 29, 2015 14:11
Bug with dependent types in scala 2.10
trait Path {
type A
}
trait Reader { self =>
def read(p: Path): Option[p.A]
def orElse(that: Reader): Reader = ComposedReader(this, that)
}
@johnynek
johnynek / dependent_type_question.scala
Last active August 29, 2015 14:11
type inference with cases. Why does this fail in scala 2.10.4? Is my logic wrong, or is the compiler not smart enough?
sealed trait Key {
type Inner
}
trait IntKey extends Key {
type Inner = Int
}
trait StringKey extends Key {
type Inner = String
}
@johnynek
johnynek / scanlefttrav1.scala
Created December 22, 2014 20:12
Why doesn't scala give scanLeft on TraversableOnce?
/**
* This won't work because of the "situation" with scala collections. TraversableOnce actually
* needs like 12 methods implemented.
*/
class ScanLeft[A, B](it: TraversableOnce[A], init: B, fn: (B, A) => B) extends TraversableOnce[B] {
override def foreach[U](effect: B => U): Unit = {
var bstate = init
effect(bstate)
it.foreach { a: A =>
bstate = fn(bstate, a)