Skip to content

Instantly share code, notes, and snippets.

@avibryant
Forked from ceteri/ Main.java
Created July 17, 2012 21:50
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save avibryant/3132369 to your computer and use it in GitHub Desktop.
Save avibryant/3132369 to your computer and use it in GitHub Desktop.
Cascading for the Impatient, part 3
class WordCount(args : Args) extends Job(args) {
Tsv(args("input"), ('doc_id, 'text))
.flatMapTo('text -> 'token){line : String => line.split("[ \\[\\]\\(\\),.]")}
.map('token -> 'token){token : String => token.trim.toLowerCase}
.filter('token){token : String => token.length > 0}
.groupBy('token){g => g.size}
.write(Tsv(args("output")))
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment