Skip to content

Instantly share code, notes, and snippets.

@dcsobral
Created November 25, 2011 13:00
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dcsobral/1393475 to your computer and use it in GitHub Desktop.
Save dcsobral/1393475 to your computer and use it in GitHub Desktop.
WordCount
// Based on Fantom's word count example here: http://blog.joda.org/2011/11/guide-to-evaluating-fantom.html
// I'm commenting the lines that need changing, and leaving a few of them uncommented,
// as they are the same
// class WordCount {
object WordCount {
// Void main(Str[] args) {
def main(args: Array[String]) {
if (args.size != 1) {
// echo("usage: Wordcount <file>")
println("usage: WordCount <file>")
// Env.cur.exit(-1)
sys exit -1
}
// Set up an empty map to count each word, setting default for each value to zero
// wordCounts := Str:Int[:] { def = 0 }
val wordCounts = collection.mutable.Map[String, Int]() withDefaultValue 0
// Open the file, read each line in order
// file := Uri(args[0]).toFile
val file = io.Source fromFile args(0)
// file.eachLine |line| {
file.getLines foreach { line =>
// skip empty lines
if (line.trim.isEmpty) return
// split and trim on whitespace into words
// words := line.split
val words = line split "\\s+"
// count each one
// words.each |word| { wordCounts[word] += 1 }
words foreach { word => wordCounts(word) += 1 }
}
// Show each word found, with its count, in alphabetical order
// wordCounts.keys.sort.each |key| {
wordCounts.keys.toSeq.sorted foreach { key =>
// echo("$key ${wordCounts[key]}")
println(key+" "+wordCounts(key))
}
}
}
@iron9light
Copy link

start from line 25:

file.getLines.filteNot(_.isEmpty).flatMap(_ split """\s+""").groupBy(_).map(case (word, wordList) => (word, wordList.size)).sortBy(_._1).foreach(case (word, count) => println(word + " " + count)

@dcsobral
Copy link
Author

You forgot the trim. Also, "groupBy(_)" will not do what you expect it to -- try "groupBy(x => x)" or "groupBy(identity)" instead. Also, neither groupBy nor sortBy are members of Iterator, and converting the whole file into a sequence would have nasty performance.

@kaja47
Copy link

kaja47 commented Nov 25, 2011

fixed version:

file.getLines.filterNot(.trim.isEmpty).flatMap( split """\s+""").toSeq.groupBy(identity).mapValues(.size).toSeq.sortBy(._1).foreach{ case (word, count) => println(word+" "+count) }

call to toSeq creates Stream

@iron9light
Copy link

Thanks, @dcsobral and @kaja47.
learned two more things about scala:

  • Predef.identity
  • mapValues

And here is the par version:

file.getLines.filterNot(_.trim.isEmpty).toStream.par.flatMap(_ split """\s+""").groupBy(identity).map{case (word, wordList) => (word, wordList.size)}.toStream.sortBy(_._1).foreach{case (word, count) => println(word + " " + count)}

We love single line :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment