Skip to content

Instantly share code, notes, and snippets.

@majidalfifi
Last active July 17, 2018 14:33
Show Gist options
  • Save majidalfifi/3eefc809456afd75580029c1419eb439 to your computer and use it in GitHub Desktop.
Save majidalfifi/3eefc809456afd75580029c1419eb439 to your computer and use it in GitHub Desktop.
# Day 3:
## Taks1: apply wordcount we did yesterday
curl -O https://raw.githubusercontent.com/rizaumami/quran-epub/master/Source/quran-simple.txt
Print out the top 10 used words in the Quran.
## Task2: Submit Hadoop Job
load quran-simple.txt to hdfs
submit word count job
check running job on the UI
investigate results
git clone https://github.com/majidalfifi/hadoopworkshop.git
cd hadoopworkshop/
mvn clean package
echo -e "this is a line\nthis is another line\nand one more" | hadoop fs -put - file1.txt
yarn jar target/mapreduce-helloworld-1.0-SNAPSHOT.jar edu.kfupm.hadoop.WordCount file1.txt results
Extra stuff: sorting
yarn jar target/mapreduce-helloworld-1.0-SNAPSHOT.jar edu.kfupm.hadoop.Sort results results_sorted
# Task3: Run Top Tweeters
yarn jar target/mapreduce-helloworld-1.0-SNAPSHOT.jar edu.kfupm.hadoop.TopTweeters /tmp/tweets-sample.json results
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment