Last active
July 17, 2018 14:33
-
-
Save majidalfifi/3eefc809456afd75580029c1419eb439 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Day 3: | |
## Taks1: apply wordcount we did yesterday | |
curl -O https://raw.githubusercontent.com/rizaumami/quran-epub/master/Source/quran-simple.txt | |
Print out the top 10 used words in the Quran. | |
## Task2: Submit Hadoop Job | |
load quran-simple.txt to hdfs | |
submit word count job | |
check running job on the UI | |
investigate results | |
git clone https://github.com/majidalfifi/hadoopworkshop.git | |
cd hadoopworkshop/ | |
mvn clean package | |
echo -e "this is a line\nthis is another line\nand one more" | hadoop fs -put - file1.txt | |
yarn jar target/mapreduce-helloworld-1.0-SNAPSHOT.jar edu.kfupm.hadoop.WordCount file1.txt results | |
Extra stuff: sorting | |
yarn jar target/mapreduce-helloworld-1.0-SNAPSHOT.jar edu.kfupm.hadoop.Sort results results_sorted | |
# Task3: Run Top Tweeters | |
yarn jar target/mapreduce-helloworld-1.0-SNAPSHOT.jar edu.kfupm.hadoop.TopTweeters /tmp/tweets-sample.json results |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment