Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Why functional programming matters (aka MapReduce for humans)
import com.cloudera.crunch._
import com.cloudera.scrunch._
class ScrunchWordCount {
def wordCount(inputFile: String, outputFile: String) = {
val pipeline = new Pipeline[ScrunchWordCount]
pipeline.read(from.textFile(inputFile))
.flatMap(_.toLowerCase.split("\\W+"))
.filter(!_.isEmpty())
.count
.write(to.textFile(outputFile)) // Word counts
.map((word, count) => (word.slice(0, 1), count))
.groupByKey.combine(v => v.sum).materialize
pipeline.done
}
}
object ScrunchWordCount {
def main(args: Array[String]) = {
new ScrunchWordCount.wordCount(args(0), args(1))
}
}
@paulmillr

This comment has been minimized.

Copy link
Owner Author

paulmillr commented Mar 10, 2012

This is analog of scala-hadoop code in imperative style.

Scala-hadoop example has six classes which are much more harder to read & understand than this small 22-LOC file.

Crunch / Scrunch is a simple & efficient FP-style wrapper over hadoop.

The example was taken from Dean Wampler's talk on fp & big data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.