Skip to content

Instantly share code, notes, and snippets.

View johnynek's full-sized avatar

P. Oscar Boykin johnynek

View GitHub Profile
@johnynek
johnynek / wordcount.scala
Created October 28, 2011 04:51
Wordcount in the scalding DSL for Cascading
package com.twitter.scalding
class WordCount(args : Args) extends Job(args) {
TextLine( args("input") ).read.
flatMap('line -> 'word) { line : String => line.split("\\s+") }.
groupBy('word) { _.size }.
write( Tsv( args("output") ) )
}
@johnynek
johnynek / gist:1641512
Created January 19, 2012 18:05
Example of closures in Python
#!/usr/bin/env python
#closures are a poor man's object (and vice-versa)
# http://okmij.org/ftp/Scheme/oop-in-fp.txt
class StoresStuff:
def __init__(self, member):
self.member = member
def poorMansObject(init_member):
@johnynek
johnynek / gist:3428248
Created August 22, 2012 18:42
Proof of intersection formula
|\intersection_{i=(1..k)} s_i| = -\sum_{b=(0..2^k - 1)} (-1)^{|b|} |\union_i (s_i)^(b_i)|
where s_i^1 = s_i, and s_i^0 = {} (the empty set).
|b| is the number of 1 values in the binary representation.
Proof, base case = k = 2 (which is to say there is just s_1, s_2).
|s_1 n s_2| = -|{}| + |s_1| + |s_2| - |s_1 u s_2|
which is clearly true because |{}| = 0, and the fact that
|s_1| + |s_2| = |s_1 n s_2| + |s_1 u s_2|
@johnynek
johnynek / gist:3588731
Created September 1, 2012 21:57
MemoryGroupBuilder
package com.twitter.scalding
import cascading.tuple.Fields
import cascading.tuple.{Tuple => CTuple, TupleEntry}
class MemoryGroupBuilder(val input : List[TupleEntry],
val prevOutput : TupleEntry = new TupleEntry(),
override val sorting : Option[Fields] = None)
extends ReduceOperations[MemoryGroupBuilder] {
@johnynek
johnynek / problem8.clj
Created September 6, 2012 04:46
Problem 8
(def bignumber "7316717653133062491922511967442657474235534919493496983520312774506326239578318016984801869478851843858615607891129494954595017379583319528532088055111254069874715852386305071569329096329522744304355766896648950445244523161731856403098711121722383113622298934233803081353362766142828064444866452387493035890729629049156044077239071381051585930796086670172427121883998797908792274921901699720888093776657273330010533678812202354218097512545405947522435258490771167055601360483958644670632441572215539753697817977846174064955149290862569321978468622482839722413756570560574902614079729686524145351004748216637048440319989000889524345065854122758866688116427171479924442928230863465674813919123162824586178664583591245665294765456828489128831426076900422421902267105562632111110937054421750694165896040807198403850962455444362981230987879927244284909188845801561660979191338754992005240636899125607176060588611646710940507754100225698315520005593572972571636269561882670428252483600823257530420752963450")
(def
@johnynek
johnynek / reach2.scala
Created November 1, 2012 20:35
Second followers in scalding
ackage com.twitter.ads.batch.experimental.oscar
import com.twitter.scalding._
import com.twitter.pluck.job._
import com.twitter.pluck.source._
import com.twitter.pluck.source.matrix._
import com.twitter.pluck.mathematics._
import com.twitter.scalding.mathematics.Monoid
/*
@johnynek
johnynek / Graph.scala
Created December 1, 2012 22:17
Sketch of scala graph algorithm API
// run with: scala Graph.scala
// Should print a bunch of crap.
// Take away:
// Simple API can easily express: PageRank, ConnectedComponents, Cosine/Jaccard similarity
// See examples below:
import scala.annotation.tailrec
import scala.collection.parallel.immutable.ParIterable
abstract class Graph[G<:Graph[_,N],N](m: Map[N,(Set[N],Set[N])]) {
@johnynek
johnynek / Monad.scala
Created December 3, 2012 00:25
Short Monad Example in Scala
// Run this with: scala Monad.scala
trait Monad[M[_]] {
// in haskell, called return, but that's a reserved word
// constructs a Monad instance from the given value, e.g. List(1)
def apply[T](v: T): M[T]
// flatMap function, in scala:
def bind[T,U](m: M[T])(fn: (T) => M[U]): M[U]
// Laws these must follow are:
// identities:
// Adds an item to the buffer and returns None, or fills the buffer and returns the sum
@tailrec
final def offer(item: V, acc: Option[V] = None): Option[V] =
if(!queue.offer(item)) {
// Queue is full
val toSum = ListBuffer[V]()
queue.drainTo(toSum.asJava)
// Add everything up and get the new acc:
val newAcc = Monoid.plus(acc, Some(Monoid.sum(toSum)))
// We never actually got the item in:
@johnynek
johnynek / attempt.scala
Created January 7, 2013 02:08
Wrapping Throwable in either.
def attempt[T](t: => T): Either[Throwable, T] = {
try {
Right(t)
}
catch {
case e: Throwable => Left(e)
}
}