Skip to content

Instantly share code, notes, and snippets.

@krishnanraman
Created July 27, 2014 01:16
Show Gist options
  • Save krishnanraman/cc3a7df6b828b89f2e35 to your computer and use it in GitHub Desktop.
Save krishnanraman/cc3a7df6b828b89f2e35 to your computer and use it in GitHub Desktop.
unorthodox benchmark
n Time(real)
============
Typed ($ time scald.rb --hdfs-local mult.scala --typed true --n XXX)
=====
250 7.4
500 8.6
1000 14.1
2000 36.0
Fields ($ time scald.rb --hdfs-local mult.scala --typed false --n XXX)
=====
250 6.9
500 6.9
1000 6.9
2000 6.9
import com.twitter.scalding._
import TDsl._
class mult(args:Args) extends Job(args) {
def mkRow(columns:Int, dominant:Int):Seq[Double] = Seq.tabulate[Double](columns)(i=> if (i==dominant) 5+math.random*10 else math.random)
val n = args("n").toInt
val typed = args("typed").toBoolean
if (typed) {
TypedPipe.from(0 until n)
.map {
i=> (i, mkRow(n, i))
}.write(TypedTsv[(Int, Seq[Double])]("typed"))
} else {
IterableSource((0 until n), 'a)
.read
.toTypedPipe[Int]('a)
.map {
i=> (i, mkRow(n, i))
}.write(TypedTsv[(Int, Seq[Double])]("untyped"))
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment