Skip to content

Instantly share code, notes, and snippets.

@johnynek
Created October 28, 2011 04:51
Show Gist options
  • Save johnynek/1321652 to your computer and use it in GitHub Desktop.
Save johnynek/1321652 to your computer and use it in GitHub Desktop.
Wordcount in the scalding DSL for Cascading
package com.twitter.scalding
class WordCount(args : Args) extends Job(args) {
TextLine( args("input") ).read.
flatMap('line -> 'word) { line : String => line.split("\\s+") }.
groupBy('word) { _.size }.
write( Tsv( args("output") ) )
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment