Skip to content

Instantly share code, notes, and snippets.

@coltfred
coltfred / HFileInputFormat.scala
Created August 14, 2012 19:05 — forked from leifwickland/HFileInputFormat.scala
Allows an HFile to be used as the input to MapReduce.
import org.apache.hadoop.fs.Path
import org.apache.hadoop.hbase.io.hfile.{ HFile, HFileScanner }
import org.apache.hadoop.hbase.io.hfile.HFile.Reader
import org.apache.hadoop.hbase.io.ImmutableBytesWritable
import org.apache.hadoop.hbase.KeyValue
import org.apache.hadoop.mapreduce.{ JobContext, InputSplit, TaskAttemptContext, RecordReader }
import org.apache.hadoop.mapreduce.lib.input.{ FileInputFormat, FileSplit }
/**
* A MapReduce InputFormat for HBase's HFile.
object RefactorPuzzle {
import scala.language.higherKinds
trait Monad[F[_]] {
def point[A](a: => A): F[A]
def flatMap[A, B](ma: F[A])(f: A => F[B]): F[B]
def map[A, B](ma: F[A])(f: A => B): F[B]
}
implicit def optionMonad = new Monad[Option] {
def point[A](a: => A): Option[A] = Some(a)
@coltfred
coltfred / gist:5759793
Created June 11, 2013 19:19
ctags for scala
--langdef=scala
--langmap=scala:.scala
--regex-Scala=/^[ \t]*(final[ \t]*)*(abstract[ \t]*)*(sealed[ \t]*)*(case[ \t]*)*class[ \t]*([a-zA-Z0-9_]+)/\5/c,classes/
--regex-Scala=/^(final[ \t]*)*(case[ \t]*)*[ \t]*object[ \t]*([a-zA-Z0-9_]+)/\3/o,objects/
--regex-Scala=/^[ \t]*(protected[ \t]*)*(sealed[ \t]*)*trait[ \t]*([a-zA-Z0-9_]+)/\3/t,traits/
--regex-Scala=/[ \t]*def[ \t]*([a-zA-Z0-9_=]+)[ \t]*.*[:=]/\1/m,methods/
--regex-Scala=/[ \t]*(final[ \t]*)*val[ \t]*([a-zA-Z0-9_]+)[ \t]*[:=]/\2/V,values/
--regex-Scala=/[ \t]*var[ \t]*([a-zA-Z0-9_]+)[ \t]*[:=]/\1/v,variables/
--regex-Scala=/^[ \t]*type[ \t]*([a-zA-Z0-9_]+)[ \t]*[\[<>=]/\1/T,types/
--regex-Scala=/^[ \t]*import[ \t]*([a-zA-Z0-9_{}., \t=>]+$)/\1/i,includes/
import scalaz._, Scalaz._
val a: List[Int] = List(1)
a.traverse{_.point[Option].toRightDisjunction("fail")}
//Methods returning an Option of Boolean(or generally any Monad of Boolean)
def one: Option[Boolean] = Some(true)
def two: Option[Boolean] = Some(true)
//I'd like to be able to write something like the following.
if (!one && two) {
}
import scalaz._, Scalaz._
implicit val b = Show.shows[Boolean]{b => if(b)"0" else " "}
true.shows //Value is still "true", why?
@coltfred
coltfred / Window.hs
Last active August 29, 2015 14:06 — forked from pchiusano/Window.hs
module Window where
import Data.Monoid
data Window a = Window [a] a [a] deriving (Show,Read)
null :: Window a -> Bool
null (Window [] _ []) = True
null _ = False
@coltfred
coltfred / step1.scala
Last active August 29, 2015 14:11 — forked from danclien/step1.scala
// Implementing functor manually
import scalaz._, Scalaz._, Free.liftF
sealed trait TestF[+A]
case class Foo[A](o: A) extends TestF[A]
case class Bar[A](h: (Int => A)) extends TestF[A]
case class Baz[A](h: (Int => A)) extends TestF[A]
implicit def testFFunctor[B]: Functor[TestF] = new Functor[TestF] {

Git DMZ Flow

I've been asked a few times over the last few months to put together a full write-up of the Git workflow we use at RichRelevance (and at Precog before), since I have referenced it in passing quite a few times in tweets and in person. The workflow is appreciably different from GitFlow and its derivatives, and thus it brings with it a different set of tradeoffs and optimizations. To that end, it would probably be helpful to go over exactly what workflow benefits I find to be beneficial or even necessary.

  • Two developers working on independent features must never be blocked by each other
    • No code freeze! Ever! For any reason!
  • A developer must be able to base derivative work on another developer's work, without waiting for any third party
  • Two developers working on inter-dependent features (or even the same feature) must be able to do so without interference from (or interfering with) any other parties
  • Developers must be able to work on multiple features simultaneously, or at lea
@coltfred
coltfred / contrib.sh
Last active August 29, 2015 14:22 — forked from non/contrib.sh
#!/bin/sh
git log --numstat | awk '/^Author: /{author=$0} /^[0-9]+\t[0-9]+/{n = $1 + $2; d[author] += n; t += n} END { for(a in d) { printf("%6d %6.3f%% %s\n", d[a], d[a] * 100 / t, a)}}' | sort -rn
# written less illegibly, it is:
#
# git log --numstat | \
# awk '
# /^Author: /{author=$0}
# /^[0-9]+\t[0-9]+/{n = $1 + $2; d[author] += n; t += n}
# END { for(a in d) { printf("%6d %6.3f%% %s\n", d[a], d[a] * 100 / t, a)}}