Last active
July 18, 2018 13:44
-
-
Save majidalfifi/7456b1e9867c874da87f5421530f3c52 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Day 4: | |
Task1: Word Count MapReduce in python using Hadoop Streaming | |
cd hadoopworkshop | |
git pull | |
echo -e "this is a line\nthis is another line\nand one more" | |
echo -e "this is a line\nthis is another line\nand one more" | python src/main/python/mapper.py | |
echo -e "this is a line\nthis is another line\nand one more" | python src/main/python/mapper.py | sort -k1,1 | |
echo -e "this is a line\nthis is another line\nand one more" | python src/main/python/mapper.py | sort -k1,1 | python src/main/python/reducer.py | |
Now run it on hadoop: | |
yarn jar /opt/cloudera/parcels/CDH/jars/hadoop-streaming*.jar \ | |
-file src/main/python/mapper.py \ | |
-file src/main/python/reducer.py \ | |
-mapper src/main/python/mapper.py \ | |
-reducer src/main/python/reducer.py \ | |
-input file1.txt \ | |
-output results66 | |
Task2: execute top tweeters | |
mvn clean compile assembly:single | |
hdfs dfs -cp /tmp/tweets-sample.json . | |
yarn jar target/mapreduce-helloworld-1.0-SNAPSHOT-jar-with-dependencies.jar edu.kfupm.hadoop.TopTweeters tweets-sample.json results | |
Task3: write a job to find most mentioned users, | |
1. copy TopTweeters.java to TopMentions.java | |
2. make the appropriate changes :) | |
mvn clean compile assembly:single | |
yarn jar target/mapreduce-helloworld-1.0-SNAPSHOT-jar-with-dependencies.jar edu.kfupm.hadoop.TopMentions /tmp/tweets-sample.json results |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment