Skip to content

Instantly share code, notes, and snippets.

@majidalfifi
Last active July 18, 2018 13:44
Show Gist options
  • Save majidalfifi/7456b1e9867c874da87f5421530f3c52 to your computer and use it in GitHub Desktop.
Save majidalfifi/7456b1e9867c874da87f5421530f3c52 to your computer and use it in GitHub Desktop.
Day 4:
Task1: Word Count MapReduce in python using Hadoop Streaming
cd hadoopworkshop
git pull
echo -e "this is a line\nthis is another line\nand one more"
echo -e "this is a line\nthis is another line\nand one more" | python src/main/python/mapper.py
echo -e "this is a line\nthis is another line\nand one more" | python src/main/python/mapper.py | sort -k1,1
echo -e "this is a line\nthis is another line\nand one more" | python src/main/python/mapper.py | sort -k1,1 | python src/main/python/reducer.py
Now run it on hadoop:
yarn jar /opt/cloudera/parcels/CDH/jars/hadoop-streaming*.jar \
-file src/main/python/mapper.py \
-file src/main/python/reducer.py \
-mapper src/main/python/mapper.py \
-reducer src/main/python/reducer.py \
-input file1.txt \
-output results66
Task2: execute top tweeters
mvn clean compile assembly:single
hdfs dfs -cp /tmp/tweets-sample.json .
yarn jar target/mapreduce-helloworld-1.0-SNAPSHOT-jar-with-dependencies.jar edu.kfupm.hadoop.TopTweeters tweets-sample.json results
Task3: write a job to find most mentioned users,
1. copy TopTweeters.java to TopMentions.java
2. make the appropriate changes :)
mvn clean compile assembly:single
yarn jar target/mapreduce-helloworld-1.0-SNAPSHOT-jar-with-dependencies.jar edu.kfupm.hadoop.TopMentions /tmp/tweets-sample.json results
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment