Skip to content

Instantly share code, notes, and snippets.

View krishnanraman's full-sized avatar

Krishnan Raman krishnanraman

View GitHub Profile
@krishnanraman
krishnanraman / Result with Parallelization
Created April 23, 2013 00:33
Compute Pi with n-digit accuracy. This is an expensive job ( takes a few seconds ) if n > 10,000. So we create a pipe with numbers 10,000 to 11,000 & compute Pi for each of these numbers. ( That means we compute Pi 1000 times, the first Pi has 10,000 decimals, the next one 10,001 decimals & so on until 11,000 ) With actors-based Parallelizer = 1…
11000 Time in ms:113453,11000=>3.141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825342117067982148086513282306647093844609550582231725359408128481117450284102701938521105559644622948954930381964428810975665933446128475648233786783165271201909145648566923460348610454326648213393607260249141273724587006606315588174881520920962829254091715364367892590360011330530548820466521384146951941511609433057270365759591953092186117381932611793105118548074462379962749567351885752724891227938183011949129833673362440656643086021394946395224737190702179860943702770539217176293176752384674818467669405132000568127145263560827785771342757789609173637178721468440901224953430146549585371050792279689258923542019956112129021960864034418159813629774771309960518707211349999998372978049951059731732816096318595024459455346908302642522308253344685035261931188171010003137838752886587533208381420617177669147303598253490428755468731159562863882353787593751957781857780532171226806613001927876611195909
@krishnanraman
krishnanraman / gist:5448953
Created April 24, 2013 01:40
reading a binary file functionally using Iterator
val path = "places.mbb"
@transient val fis = new FileInputStream( path )
@transient val ois = new ObjectInputStream(fis)
def mkRow = (ois.readLong, ois.readFloat, ois.readFloat, ois.readFloat, ois.readFloat)
val empty = (-1L,-1f,-1f,-1f,-1f)
val treeData = Iterator.iterate( mkRow ) ((row:(Long,Float,Float,Float,Float)) => {
try {
@krishnanraman
krishnanraman / gist:5682312
Created May 31, 2013 00:42
functions as key - example
scala> def revMap[A, B](m: Map[A, B]): Map[B, A] = m.map(kv => (kv._2,kv._1))
scala> def getFlag[A, B](p: A, mymap: Map[A => Boolean, B]): B = mymap.find { x => x._1(p) }.get._2
scala> val m = Map("even" -> ((x: Int) => x%2 == 0), "odd" ->((x: Int) => x%2 == 1))
scala> val rev_m = revMap(m)
scala> getFlag( 3, rev_m)
res5: java.lang.String = odd
scala> getFlag( 2, rev_m)
res6: java.lang.String = even
@krishnanraman
krishnanraman / n records => 38 services == 38 directories
Last active December 18, 2015 09:29
Scalding job to read a Tsv of ~6000 "Sample" records & write to pail based on some partition criteria. Partition Criteria: All Samples with a common service go to the same pail subdirectory. Notice that each record goes to 1 pail => ~6000 pailfiles will be created.
$ tree pailtest/tfe
pailtest/tfe
├── part-000130.pailfile
├── part-000140.pailfile
├── part-000440.pailfile
├── part-000451.pailfile
├── part-000460.pailfile
├── part-000470.pailfile
├── part-000481.pailfile
├── part-000491.pailfile
@krishnanraman
krishnanraman / gist:5770687
Last active December 18, 2015 10:48
Pail compaction
By default, Pail sinks each record to a pailfile.
Given a million records, you'd have 1000000 tiny pailfiles instead of a few large pailfiles.
To compact pail:
Define record = List of "stuff"
To avoid OOM errors, we create these lists so as not to exceed 100 "stuff" per list
To do so - in a mapping operation, add a dummy column whose value spans random [0-100)
Group-By (partitioner, dummy) & fold over the partitioner to build a list
Discard the dummy
Now we build a Bijection List[stuff] <=> Array[Byte], and we are good to go.
@krishnanraman
krishnanraman / histogram.png
Last active December 18, 2015 12:49
Scalding + D3.js
Often we use Scalding to compute a disributed algo that generates tons of data.
For eg. imagine a simple Scalding job
-comb through 100 million user requests
-find (lat,lng) where each request originated.
-Convert (lat,lng) to zipcode via reverse geocoding.
-Visualizing result via a histogram for a bunch of zipcodes.
So say you pick 10 zipcodes in some county, I show you how many people hit your website from each zipcode.
The hard problem here isn't the scalding job -
@krishnanraman
krishnanraman / gist:5854953
Last active December 18, 2015 22:29
intersecting circles on html5 canvas
<canvas id="c" width="800" height="600" class="clear" style="border:1px dotted;float:left"></canvas>
<script>
var w = 800
var h = 600
function draw_grid(ctx, w, h) {
ctx.beginPath();
/* vertical lines */
for (var x = 0.5; x < w; x += 10) {
ctx.moveTo(x, 0);
@krishnanraman
krishnanraman / gist:5855410
Last active December 18, 2015 22:38
ServiceDependencies: Calculate dependencies between the various services that talk to zipkin. This job produces the data that powers http://bmd-linux:8080/radial.html (check it out, its pretty cool)
package com.twitter.observability.analytics.jobs
import com.twitter.pluck.job.TwitterDateJob
import com.twitter.scalding._
import com.twitter.observability.analytics._
import com.twitter.zipkin.common.{Service, DependencyLink, Dependencies, Span}
import com.twitter.zipkin.conversions.thrift._
import com.twitter.algebird.Moments
import java.net.InetSocketAddress
def ipsort(list: Seq[String]) = list.map { ip => (ip,ip.split("\\.").map { _.toInt }) }.sortBy { x => (x._2(0),x._2(1),x._2(2),x._2(3)) } .map { x => x._1 }
def ip = util.Random.nextInt(255)
scala> val iplist = (1 to 45).map( t=> List(ip,ip,ip,ip).mkString("."))
iplist: scala.collection.immutable.IndexedSeq[String] = Vector(161.207.36.114, 114.131.224.6, 66.19.40.40, 213.224.41.203, 240.83.226.230, 177.140.71.152, 36.59.114.55, 192.78.128.47, 145.195.230.139, 228.152.51.5, 145.247.88.171, 17.75.73.248, 126.208.19.25, 64.119.170.228, 127.191.169.231, 251.139.227.2, 198.31.127.104, 206.151.169.33, 123.182.133.93, 119.114.93.116, 71.153.242.4, 12.90.11.215, 101.26.233.120, 116.231.208.157, 229.80.101.69, 184.247.200.247, 106.12.212.3, 75.214.3.187, 181.222.181.102, 230.132.17.199, 112.222.212.13, 79.246.32.50, 122.134.187.160, 228.61.226.54, 46.155.155.122, 97.55.2.3, 26.211.171.172, 197.242.106.144, 243.14.201.158, 43.15.38.60, 79.156.49.104, 240.14.20.160, 16.231.50.179, 31.197.115.109, 91.247.1.166)
scala>
@krishnanraman
krishnanraman / Results
Last active December 20, 2015 00:29
A simple Clustering Algo for n-dimension vectors
$ scala Clustering
11 elements in 3 clusters
List(List(4.0), List(3.0), List(2.0), List(1.0))
List(List(22.0), List(21.0), List(20.0))
List(List(10.0), List(11.0), List(12.0), List(9.0))
15 elements in 5 clusters
List(List(40.0, 1.0), List(40.0, 2.0), List(40.0, 3.0))
List(List(20.0, 3.0), List(20.0, 2.0), List(20.0, 1.0))