Skip to content

Instantly share code, notes, and snippets.

@dgadiraju
Last active March 4, 2018 16:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save dgadiraju/35be0d82e2198e416f6d38ef2d1af596 to your computer and use it in GitHub Desktop.
Save dgadiraju/35be0d82e2198e416f6d38ef2d1af596 to your computer and use it in GitHub Desktop.
val inputPath = "/Users/itversity/Research/data/wordcount.txt" or val inputPath = "/public/randomtextwriter/part-m-00000"
val outputPath = "/Users/itversity/Research/data/wordcount" or val outputPath = "/user/dgadiraju/wordcount"
//Make sure outputPath does not exist for this example
sc.textFile(inputPath).
flatMap(_.split(" ")).
map((_, 1)).
reduceByKey(_ + _).
take(100).
foreach(println)
//alternative
sc.textFile(inputPath).
flatMap(line => line.split(" ").map(rec => (rec, 1))).
reduceByKey(_ + _).
take(100).
foreach(println)
//Saving to file
sc.textFile(inputPath).
flatMap(_.split(" ")).
map((_, 1)).
reduceByKey(_ + _).
map(rec => rec._1 + "\t" + rec._2).
saveAsTextFile(outputPath)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment