Skip to content

Instantly share code, notes, and snippets.

@borhan-kazimipour
Last active May 14, 2019 06:34
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save borhan-kazimipour/cc7c8cec1188fd387cc2e3ec0f4fed7a to your computer and use it in GitHub Desktop.
Save borhan-kazimipour/cc7c8cec1188fd387cc2e3ec0f4fed7a to your computer and use it in GitHub Desktop.
Scala Simple Word Count Example

Local Instructions

  1. copy files to a directory: git clone https://gist.github.com/cc7c8cec1188fd387cc2e3ec0f4fed7a.git wordcount and then cd wordcount.
  2. see the input files: cat *.txt
  3. make sure mapper and reducer are executable chmod +x *.scala
  4. see how mapper works: cat baa.txt | ./mapper.scala
  5. see how reducer works: cat baa.txt | ./mapper.scala | ./reducer.scala

Hadoop Instruction

  1. copy files to a directory: git clone https://gist.github.com/cc7c8cec1188fd387cc2e3ec0f4fed7a.git wordcount and then cd wordcount.
  2. create a directory on HDFS: hadoop fs -mkdir -p /wc/in
  3. copy input files into HDFS: hadoop fs -put *.txt /wc/in
  4. make sure the files are transfered: hadoop fs -ls /wc/in You can also read their content using -cat
  5. make sure the mapper and reducer scripts are executable: chmod +x *.scala
  6. make sure the output directory dose NOT exist: hadoop fs -ls /wc/out
  7. issue: hadoop jar /home/user/hadoop-2.7.3/share/hadoop/tools/lib/hadoop-streaming-2.7.3.jar -mapper mapper.scala -reducer reducer.scala -input /wc/in/* -output /wc/out
  8. make sure the above script run successfully: hadoop fs -ls /wc/out You should see a zero byte file called _SUCCESS
  9. read the output: hadoop fs -cat /wc/out/part-00000
As I was going to St. Ives,
I met a man with seven wives,
Each wife had seven sacks,
Each sack had seven cats,
Each cat had seven kits:
Kits, cats, sacks, and wives,
How many were there going to St. Ives.
Baa, baa, black sheep,
Have you any wool?
Yes, sir, yes, sir,
Three bags full;
One for the master,
And one for the dame,
And one for the little boy
Who lives down the lane.
#!/usr/bin/env scala
for (line <- io.Source.stdin.getLines) {
line.split(" ").foreach(x=>println(s"$x\t1"))
}
#!/usr/bin/env scala
var wordCount = scala.collection.immutable.Map[String,Int]()
for (ln <- io.Source.stdin.getLines) {
var wordOne = ln.split("\t")
if (wordCount.contains(wordOne(0))){
wordCount += (wordOne(0) -> (wordCount(wordOne(0))+wordOne(1).toInt))
} else {
wordCount += (wordOne(0) -> wordOne(1).toInt)
}
}
wordCount.foreach(println)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment