Skip to content

Instantly share code, notes, and snippets.

@juanpampliega
Created August 25, 2015 21:56
Show Gist options
  • Save juanpampliega/178fa08ed6c7a3e91cbe to your computer and use it in GitHub Desktop.
Save juanpampliega/178fa08ed6c7a3e91cbe to your computer and use it in GitHub Desktop.
val docs = sc.textFile("/opt/dataset/don-quijote.txt.gz")
val lower = docs.map(line => line.toLowerCase)
val words = lower.flatMap(line => line.split("\\s+"))
val counts = words.map(word => (word, 1))
val freq = counts.reduceByKey(_ + _)
val invFreq = freq.map(_.swap)
invFreq.top(20).foreach(println)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment