Skip to content

Instantly share code, notes, and snippets.

View krishnanraman's full-sized avatar

Krishnan Raman krishnanraman

View GitHub Profile
@krishnanraman
krishnanraman / gist:4759444
Created February 12, 2013 01:54
PropTest
/////////// INPUT PIPE = props.txt, Columns: date, displaylocation, engagements ///////////
2/1 1 10
2/1 1 20
2/1 2 30
2/1 2 10
2/1 3 20
2/1 4 10
2/2 1 10
2/2 2 20
2/2 2 15
@krishnanraman
krishnanraman / gist:4759672
Last active October 25, 2016 01:32
Tony Morris class
// And by the way, please feel free to contribute, correct errors, etc!
trait Functor[F[_]] {
def fmap[A, B](f: A => B): F[A] => F[B]
}
object Functor {
val ListFunctor: Functor[List] =
new Functor[List] {
def fmap[A, B](f: A => B) =
@krishnanraman
krishnanraman / gist:4991798
Created February 20, 2013 01:07
Abelian group
trait Abelian
case class Zn(order:Int, zero:Int) extends Abelian {
def identity = zero
def size = order
def elements = (1 to order).toSeq
def cayley = {
Vector.tabulate(order,order)((x,y)=> {
val idx = math.abs(zero - (x+1)) // difference between x & zero
val timesToShift = if ((x+1)>=zero) idx else (order-idx)
@krishnanraman
krishnanraman / gist:5207976
Last active December 15, 2015 05:19
Test Pail using Ints - given numbers 1 to 100, create a nested directory structure, where numbers below 50 go into one tree & those above into another. Further, in each tree, we create subdirectories based on number mod 7. So you should see 2+14 = 16 directories after you run this piece of code. They should partition the input space {1..100} exa…
import com.backtype.hadoop.pail.PailStructure
import java.util.{ List => JList }
import scala.collection.JavaConverters._
import com.twitter.scalding._
import com.twitter.scalding.commons.source.{PailSource,CodecPailStructure}
import com.twitter.bijection.{NumericInjections, Injection}
class PailTest2(args : Args) extends Job(args) {
val pipe = IterableSource((1 to 100), "src").read
@krishnanraman
krishnanraman / gist:5209602
Last active December 15, 2015 05:29
Pail example : writejob - a job to create Pails, readjob - a job to read Pails that have been created.
import com.backtype.hadoop.pail.PailStructure
import java.util.{ List => JList }
import scala.collection.JavaConverters._
import com.twitter.scalding._
import com.twitter.scalding.commons.source.{PailSource,CodecPailStructure}
import com.twitter.bijection.{NumericInjections, Injection}
class PailTest2Write(args : Args) extends Job(args) {
args("io") match {
case "read" => readjob
@krishnanraman
krishnanraman / gist:5224937
Last active December 15, 2015 07:39
Pail Example
Pail example:
Writejob: Partition numbers [1..100] into two directories - belowfifty & abovefifty.
Further, create 7 subdirectories under each, based on number mod 7.
So a number like 62 would end up in the location "abovefifty/6".
Readjob: Read the subdirectories "belowfifty/3" & "abovefifty/0"
RESULTS:
$ tree pailtest
@krishnanraman
krishnanraman / gist:5258053
Created March 27, 2013 21:13
multiple columns in IterableSource
import com.twitter.scalding._
class HistogramTest(args : Args) extends Job(args) {
IterableSource(List((1,2),(3,4),(5,6),(7,8)), List('a,'b))
.read
.map(('a,'b)->('c)){
x:(Int,Int) =>
x._1+x._2
}.write(Tsv("data/histo_test"))
}
@krishnanraman
krishnanraman / gist:5259351
Last active December 15, 2015 12:19
Given some data, dump its pdf into a MYSQL db for purposes of plotting a histogram ( db accessed via Ruby )
import com.twitter.scalding._
import com.twitter.scalding.mathematics.Histogram
import util.Random
class HistogramTest(args : Args) extends Job(args) {
def cdf2pdf(cdf:Map[Double,Double], keys:List[Double], size:Int):Map[Double,Double] = {
var m = Map[Double,Double]()
keys.foldLeft((m, 0.0d))((a,b) =>{
val myval = cdf(b)*size
@krishnanraman
krishnanraman / result
Last active December 15, 2015 13:39
Histogram of Normal distribution
import com.twitter.scalding._
import util.Random
// Produce a histogram with 10 bins, of n numbers with Gaussian distribution N(5,1) ~ mean 5, stdev 1.
class HistogramTest(args : Args) extends Job(args) {
def l2t10(x:List[Int]) = Tuple10(x(0), x(1),x(2),x(3), x(4),x(5),x(6), x(7),x(8),x(9))
val tuples = (1 to args("n").toInt).map( x=> Random.nextGaussian + 5)
val bins = 10
@krishnanraman
krishnanraman / gist:5372968
Created April 12, 2013 15:42
(Lat,Lng)=>zipcode conversion. Algo: Read a static zipcode csv, find nearest lat-lng (x',y') to the input lat-lng (x,y), read off the zipcode associated with (x',y').
/*
@author: Krishnan Raman
usage: GeoToZipCode.convert( latlng )
*/
import scala.io.Source
object GeoToZipcode {
lazy val zip = {
val lines = Source.fromFile("zipcodes.csv").getLines
lines.next // skip header