Skip to content

Instantly share code, notes, and snippets.

@paulmillr
Created March 10, 2012 16:03
Show Gist options
  • Star 6 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save paulmillr/2011876 to your computer and use it in GitHub Desktop.
Save paulmillr/2011876 to your computer and use it in GitHub Desktop.
Why functional programming matters (aka MapReduce for humans)
import com.cloudera.crunch._
import com.cloudera.scrunch._
class ScrunchWordCount {
def wordCount(inputFile: String, outputFile: String) = {
val pipeline = new Pipeline[ScrunchWordCount]
pipeline.read(from.textFile(inputFile))
.flatMap(_.toLowerCase.split("\\W+"))
.filter(!_.isEmpty())
.count
.write(to.textFile(outputFile)) // Word counts
.map((word, count) => (word.slice(0, 1), count))
.groupByKey.combine(v => v.sum).materialize
pipeline.done
}
}
object ScrunchWordCount {
def main(args: Array[String]) = {
new ScrunchWordCount.wordCount(args(0), args(1))
}
}
@paulmillr
Copy link
Author

This is analog of scala-hadoop code in imperative style.

Scala-hadoop example has six classes which are much more harder to read & understand than this small 22-LOC file.

Crunch / Scrunch is a simple & efficient FP-style wrapper over hadoop.

The example was taken from Dean Wampler's talk on fp & big data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment