Skip to content

Instantly share code, notes, and snippets.

@mh-github
Last active August 29, 2015 14:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mh-github/25cff3ed12e60a4153b4 to your computer and use it in GitHub Desktop.
Save mh-github/25cff3ed12e60a4153b4 to your computer and use it in GitHub Desktop.
Commands and editing for the five steps of the Hadoop DIY tutorial by Prithwis Mukherjee ( @prithwis ) at Ref [1].
The five steps
--------------
1. Install Hadoop 2.2, in a single machine cluster mode on a machine running Ubuntu
2. Compile and run the standard WordCount example in Java
3. Compile and run another, non WordCount, program in Java
4. Use the Hadoop streaming utility to run a WordCount program written in Python, as an example of a non-Java application
5. Compile and run a java program that actually solves a small but representative Predictive Analytics problem
Note :
------
a) Lines starting with --> are the commands I ran at the prompt or did editing inside a file.
b) Step 1 for from reference [2] below
References :
------------
[1] http://thoughtshoppe.blogspot.in/2014/05/getting-started-with-mapreduce-and.html
[2] http://www.ercoppa.org/Linux-Install-Hadoop-220-on-Ubuntu-Linux-1304-Single-Node-Cluster.htm
Step 1 : Hadoop installation
----------------------------
--> sudo apt-get install openssh-server
--> ssh-keygen -t rsa -P ""
Press Enter
--> cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
(Optional) Disable SSH login from remote addresses by setting in /etc/ssh/sshd_config:
ListenAddress 127.0.0.1
--> ssh localhost
Welcome to Ubuntu 14.04.1 LTS (GNU/Linux 3.13.0-34-generic x86_64)
* Documentation: https://help.ubuntu.com/
Last login: Wed Aug 27 18:10:55 2014 from localhost
--> exit
Go to hadoop web site and download Hadoop 2.2.0
--> cd Downloads
--> tar xvf hadoop-2.2.0.tar.gz
--> mv hadoop-2.2.0 ~/hadoop
--> mkdir -p ~/hadoop/data/namenode
--> mkdir -p ~/hadoop/data/datanode
--> ~/hadoop/etc/hadoop/hadoop-env.sh (after the comment "The java implementation to use."):
---- export JAVA_HOME="`dirname $(readlink /etc/alternatives/java)`/../"
---- export HADOOP_COMMON_LIB_NATIVE_DIR="~/hadoop/lib"
---- export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=~/hadoop/lib"
--> ~/hadoop/etc/hadoop/core-site.xml (inside <configuration> tag):
---- <property>
---- <name>fs.default.name</name>
---- <value>hdfs://localhost:9000</value>
---- </property>
--> ~/hadoop/etc/hadoop/hdfs-site.xml (inside <configuration> tag):
---- <property>
---- <name>dfs.replication</name>
---- <value>1</value>
---- </property>
---- <property>
---- <name>dfs.namenode.name.dir</name>
---- <value>${user.home}/hadoop/data/namenode</value>
---- </property>
---- <property>
---- <name>dfs.datanode.data.dir</name>
---- <value>${user.home}/hadoop/data/datanode</value>
---- </property>
--> ~/hadoop/etc/hadoop/yarn-site.xml (inside <configuration> tag):
---- <property>
---- <name>yarn.nodemanager.aux-services</name>
---- <value>mapreduce_shuffle</value>
---- </property>
---- <property>
---- <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
---- <value>org.apache.hadoop.mapred.ShuffleHandler</value>
---- </property>
--> cp ~/hadoop/etc/hadoop/mapred-site.xml.template ~/hadoop/etc/hadoop/mapred-site.xml
--> insert (inside <configuration> tag):
---- <property>
---- <name>mapreduce.framework.name</name>
---- <value>yarn</value>
---- </property>
--> echo "export PATH=$PATH:~/hadoop/bin:~/hadoop/sbin" >> ~/.bashrc
--> source ~/.bashrc
--> hdfs namenode -format
--> start-dfs.sh && start-yarn.sh
--> jps
--> hadoop jar ~/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar TestDFSIO -write -nrFiles 20 -fileSize 10
14/08/29 13:51:01 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
14/08/29 13:51:01 INFO fs.TestDFSIO: Date & time: Fri Aug 29 13:51:01 IST 2014
14/08/29 13:51:01 INFO fs.TestDFSIO: Number of files: 20
14/08/29 13:51:01 INFO fs.TestDFSIO: Total MBytes processed: 200.0
14/08/29 13:51:01 INFO fs.TestDFSIO: Throughput mb/sec: 2.8908835985719037
14/08/29 13:51:01 INFO fs.TestDFSIO: Average IO rate mb/sec: 3.428131580352783
14/08/29 13:51:01 INFO fs.TestDFSIO: IO rate std deviation: 1.655113127097678
14/08/29 13:51:01 INFO fs.TestDFSIO: Test exec time sec: 217.695
14/08/29 13:51:01 INFO fs.TestDFSIO:
--> hadoop jar ~/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar TestDFSIO -clean
14/08/29 13:52:20 INFO fs.TestDFSIO: TestDFSIO.1.7
14/08/29 13:52:20 INFO fs.TestDFSIO: nrFiles = 1
14/08/29 13:52:20 INFO fs.TestDFSIO: nrBytes (MB) = 1.0
14/08/29 13:52:20 INFO fs.TestDFSIO: bufferSize = 1000000
14/08/29 13:52:20 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
14/08/29 13:52:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/08/29 13:52:23 INFO fs.TestDFSIO: Cleaning up test files
--> hadoop jar ~/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 2 5
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=236
File Output Format Counters
Bytes Written=97
Job Finished in 48.305 seconds
Estimated value of Pi is 3.60000000000000000000
--> stop-dfs.sh && stop-yarn.sh
Step 2
------
--> cd ~/Code/java/BookText
--> javac -cp /home/mahboob/hadoop/share/hadoop/common/hadoop-common-2.2.0.jar:/home/mahboob/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.2.0.jar:/home/mahboob/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar: -d WC-classes
--> javac -cp /home/mahboob/hadoop/share/hadoop/common/hadoop-common-2.2.0.jar:/home/mahboob/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.2.0.jar:/home/mahboob/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar: -d WC-classes WordMapper.java
--> javac -cp /home/mahboob/hadoop/share/hadoop/common/hadoop-common-2.2.0.jar:/home/mahboob/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.2.0.jar:/home/mahboob/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar: -d WC-classes SumReducer.java
--> javac -cp /home/mahboob/hadoop/share/hadoop/common/hadoop-common-2.2.0.jar:/home/mahboob/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.2.0.jar:/home/mahboob/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar: -d WC-classes WordCount.java
/home/mahboob/hadoop/share/hadoop/common/hadoop-common-2.2.0.jar(org/apache/hadoop/fs/Path.class): warning: Cannot find annotation method 'value()' in type 'LimitedPrivate': class file for org.apache.hadoop.classification.InterfaceAudience not found
1 warning
--> mahboob@mahboob-CQ45-nb:~/Code/java/BookText$ jar -cvf WordCount.jar -C WC-classes/ .
added manifest
adding: WordCount.class(in = 1694) (out= 854)(deflated 49%)
adding: WordMapper.class(in = 1681) (out= 733)(deflated 56%)
adding: SumReducer.class(in = 1690) (out= 712)(deflated 57%)
--> ls
SumReducer.java WC-classes WC-input WordCount.jar WordCount.java WordMapper.java
--> ls WC-input
davinci.txt The-Outline-Of-Science.txt Ulysses.txt
--> hdfs namenode -format
--> start-dfs.sh && start-yarn.sh
--> jps
10893 DataNode
11442 NodeManager
11292 ResourceManager
10742 NameNode
11484 Jps
11119 SecondaryNameNode
--> hdfs dfs -rm -r data/WC-input
--> hdfs dfs -rm -r data/WC-output
--> hdfs dfs -mkdir -p data/WC-input
(mh-note: in hdfs, not os fs, directory created as /user/mahboob/data/WC-input)
--> hdfs dfs -copyFromLocal WC-input/* data/WC-input
--> hdfs dfs -ls data/WC-input
14/09/26 13:09:34 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 3 items
-rw-r--r-- 1 mahboob supergroup 674570 2014-09-26 12:57 data/WC-input/The-Outline-Of-Science.txt
-rw-r--r-- 1 mahboob supergroup 1573150 2014-09-26 12:57 data/WC-input/Ulysses.txt
-rw-r--r-- 1 mahboob supergroup 1423803 2014-09-26 12:57 data/WC-input/davinci.txt
--> hadoop jar WordCount.jar WordCount data/WC-input data/WC-output
URLs:
http://localhost:8088/cluster
http://localhost:50070/dfshealth.jsp
--> stop-dfs.sh && stop-yarn.sh
Step 3
------
mahboob@mahboob-CQ45-nb:~/Code/java/hadoop/marketratings$ pwd
/home/mahboob/Code/java/hadoop/marketratings
--> ls
marketratings.csv MarketRatings.java
--> javac -cp /home/mahboob/hadoop/share/hadoop/common/hadoop-common-2.2.0.jar:/home/mahboob/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.2.0.jar:/home/mahboob/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar: -d classes MarketRatings.java
--> jar -cvf MarketRatings.jar -C classes/ .
--> start-dfs.sh && start-yarn.sh
--> jps
--> hdfs dfs -mkdir -p data/MR-input
--> hdfs dfs -copyFromLocal marketratings.csv data/MR-input
--> hdfs dfs -ls data/MR-input
14/09/26 23:52:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rw-r--r-- 1 mahboob supergroup 1951777 2014-09-26 23:51 data/MR-input/marketratings.csv
--> hadoop jar MarketRatings.jar MarketRatings data/MR-input data/MR-output
--> stop-dfs.sh && stop-yarn.sh
Step 4
------
mahboob@mahboob-CQ45-nb:~/Code/Python/hadoop$ pwd
/home/mahboob/Code/Python/hadoop
mahboob@mahboob-CQ45-nb:~/Code/Python/hadoop$ ls
mapper.py reducer.py
--> start-dfs.sh && start-yarn.sh
--> jps
--> hdfs dfs -ls
--> hdfs dfs -ls data
--> hdfs dfs -rm -r data/WCpy-output
--> mahboob@mahboob-CQ45-nb:~/Code/Python/hadoop$ ls
mapper.py mapper.py.first reducer.py reducer.py.first
[my note : the first version of mapper.py and reducer.py on Michael Noll's site ran the job successully but generated an empty output file. So I copied the second version]
--> hadoop jar ~/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar -file ./mapper.py -mapper ./mapper.py -file ./reducer.py -reducer ./reducer.py -input data/WC-input/* -output data/WCpy-output
Step 5
------
--> mahboob@mahboob-CQ45-nb:~/Code/java/hadoop/linearregression$ pwd
/home/mahboob/Code/java/hadoop/linearregression
--> mahboob@mahboob-CQ45-nb:~/Code/java/hadoop/linearregression$ ls
Participant.java Projection.java ProjectionMapper.java ProjectionReducer.java
--> mkdir -p REG-classes
--> javac -cp /home/mahboob/hadoop/share/hadoop/common/hadoop-common-2.2.0.jar:/home/mahboob/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.2.0.jar:/home/mahboob/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar: -d REG-classes Participant.java
--> javac -cp /home/mahboob/hadoop/share/hadoop/common/hadoop-common-2.2.0.jar:/home/mahboob/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.2.0.jar:/home/mahboob/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar: -d REG-classes ProjectionMapper.java
--> javac -cp /home/mahboob/hadoop/share/hadoop/common/hadoop-common-2.2.0.jar:/home/mahboob/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.2.0.jar:/home/mahboob/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar:REG-classes -d REG-classes ProjectionReducer.java
--> javac -cp /home/mahboob/hadoop/share/hadoop/common/hadoop-common-2.2.0.jar:/home/mahboob/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.2.0.jar:/home/mahboob/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar:REG-classes -d REG-classes Projection.java
/home/mahboob/hadoop/share/hadoop/common/hadoop-common-2.2.0.jar(org/apache/hadoop/fs/Path.class): warning: Cannot find annotation method 'value()' in type 'LimitedPrivate': class file for org.apache.hadoop.classification.InterfaceAudience not found
Note: Projection.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
1 warning
--> jar -cvf Projection.jar -C REG-classes/ .
--> start-dfs.sh && start-yarn.sh
--> jps
4884 Jps
4105 NameNode
4834 NodeManager
4482 SecondaryNameNode
4259 DataNode
4685 ResourceManager
--> hdfs dfs -mkdir -p data/REG-input
--> mahboob@mahboob-CQ45-nb:~/Code/java/hadoop/linearregression$ ls
Participant.java Projection.jar Projection.java ProjectionMapper.java ProjectionReducer.java REG-classes RegScore.txt
--> hdfs dfs -copyFromLocal RegScore.txt data/REG-input
--> hdfs dfs -ls data/REG-input
14/09/29 17:35:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rw-r--r-- 1 mahboob supergroup 114 2014-09-29 17:35 data/REG-input/RegScore.txt
--> hadoop jar Projection.jar com.rukbysoft.examples.regressionMR.Projection data/REG-input data/REG-output
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment