Last active
August 29, 2015 14:07
-
-
Save mh-github/25cff3ed12e60a4153b4 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Commands and editing for the five steps of the Hadoop DIY tutorial by Prithwis Mukherjee ( @prithwis ) at Ref [1]. | |
The five steps | |
-------------- | |
1. Install Hadoop 2.2, in a single machine cluster mode on a machine running Ubuntu | |
2. Compile and run the standard WordCount example in Java | |
3. Compile and run another, non WordCount, program in Java | |
4. Use the Hadoop streaming utility to run a WordCount program written in Python, as an example of a non-Java application | |
5. Compile and run a java program that actually solves a small but representative Predictive Analytics problem | |
Note : | |
------ | |
a) Lines starting with --> are the commands I ran at the prompt or did editing inside a file. | |
b) Step 1 for from reference [2] below | |
References : | |
------------ | |
[1] http://thoughtshoppe.blogspot.in/2014/05/getting-started-with-mapreduce-and.html | |
[2] http://www.ercoppa.org/Linux-Install-Hadoop-220-on-Ubuntu-Linux-1304-Single-Node-Cluster.htm | |
Step 1 : Hadoop installation | |
---------------------------- | |
--> sudo apt-get install openssh-server | |
--> ssh-keygen -t rsa -P "" | |
Press Enter | |
--> cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys | |
(Optional) Disable SSH login from remote addresses by setting in /etc/ssh/sshd_config: | |
ListenAddress 127.0.0.1 | |
--> ssh localhost | |
Welcome to Ubuntu 14.04.1 LTS (GNU/Linux 3.13.0-34-generic x86_64) | |
* Documentation: https://help.ubuntu.com/ | |
Last login: Wed Aug 27 18:10:55 2014 from localhost | |
--> exit | |
Go to hadoop web site and download Hadoop 2.2.0 | |
--> cd Downloads | |
--> tar xvf hadoop-2.2.0.tar.gz | |
--> mv hadoop-2.2.0 ~/hadoop | |
--> mkdir -p ~/hadoop/data/namenode | |
--> mkdir -p ~/hadoop/data/datanode | |
--> ~/hadoop/etc/hadoop/hadoop-env.sh (after the comment "The java implementation to use."): | |
---- export JAVA_HOME="`dirname $(readlink /etc/alternatives/java)`/../" | |
---- export HADOOP_COMMON_LIB_NATIVE_DIR="~/hadoop/lib" | |
---- export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=~/hadoop/lib" | |
--> ~/hadoop/etc/hadoop/core-site.xml (inside <configuration> tag): | |
---- <property> | |
---- <name>fs.default.name</name> | |
---- <value>hdfs://localhost:9000</value> | |
---- </property> | |
--> ~/hadoop/etc/hadoop/hdfs-site.xml (inside <configuration> tag): | |
---- <property> | |
---- <name>dfs.replication</name> | |
---- <value>1</value> | |
---- </property> | |
---- <property> | |
---- <name>dfs.namenode.name.dir</name> | |
---- <value>${user.home}/hadoop/data/namenode</value> | |
---- </property> | |
---- <property> | |
---- <name>dfs.datanode.data.dir</name> | |
---- <value>${user.home}/hadoop/data/datanode</value> | |
---- </property> | |
--> ~/hadoop/etc/hadoop/yarn-site.xml (inside <configuration> tag): | |
---- <property> | |
---- <name>yarn.nodemanager.aux-services</name> | |
---- <value>mapreduce_shuffle</value> | |
---- </property> | |
---- <property> | |
---- <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> | |
---- <value>org.apache.hadoop.mapred.ShuffleHandler</value> | |
---- </property> | |
--> cp ~/hadoop/etc/hadoop/mapred-site.xml.template ~/hadoop/etc/hadoop/mapred-site.xml | |
--> insert (inside <configuration> tag): | |
---- <property> | |
---- <name>mapreduce.framework.name</name> | |
---- <value>yarn</value> | |
---- </property> | |
--> echo "export PATH=$PATH:~/hadoop/bin:~/hadoop/sbin" >> ~/.bashrc | |
--> source ~/.bashrc | |
--> hdfs namenode -format | |
--> start-dfs.sh && start-yarn.sh | |
--> jps | |
--> hadoop jar ~/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar TestDFSIO -write -nrFiles 20 -fileSize 10 | |
14/08/29 13:51:01 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write | |
14/08/29 13:51:01 INFO fs.TestDFSIO: Date & time: Fri Aug 29 13:51:01 IST 2014 | |
14/08/29 13:51:01 INFO fs.TestDFSIO: Number of files: 20 | |
14/08/29 13:51:01 INFO fs.TestDFSIO: Total MBytes processed: 200.0 | |
14/08/29 13:51:01 INFO fs.TestDFSIO: Throughput mb/sec: 2.8908835985719037 | |
14/08/29 13:51:01 INFO fs.TestDFSIO: Average IO rate mb/sec: 3.428131580352783 | |
14/08/29 13:51:01 INFO fs.TestDFSIO: IO rate std deviation: 1.655113127097678 | |
14/08/29 13:51:01 INFO fs.TestDFSIO: Test exec time sec: 217.695 | |
14/08/29 13:51:01 INFO fs.TestDFSIO: | |
--> hadoop jar ~/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar TestDFSIO -clean | |
14/08/29 13:52:20 INFO fs.TestDFSIO: TestDFSIO.1.7 | |
14/08/29 13:52:20 INFO fs.TestDFSIO: nrFiles = 1 | |
14/08/29 13:52:20 INFO fs.TestDFSIO: nrBytes (MB) = 1.0 | |
14/08/29 13:52:20 INFO fs.TestDFSIO: bufferSize = 1000000 | |
14/08/29 13:52:20 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO | |
14/08/29 13:52:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable | |
14/08/29 13:52:23 INFO fs.TestDFSIO: Cleaning up test files | |
--> hadoop jar ~/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 2 5 | |
Shuffle Errors | |
BAD_ID=0 | |
CONNECTION=0 | |
IO_ERROR=0 | |
WRONG_LENGTH=0 | |
WRONG_MAP=0 | |
WRONG_REDUCE=0 | |
File Input Format Counters | |
Bytes Read=236 | |
File Output Format Counters | |
Bytes Written=97 | |
Job Finished in 48.305 seconds | |
Estimated value of Pi is 3.60000000000000000000 | |
--> stop-dfs.sh && stop-yarn.sh | |
Step 2 | |
------ | |
--> cd ~/Code/java/BookText | |
--> javac -cp /home/mahboob/hadoop/share/hadoop/common/hadoop-common-2.2.0.jar:/home/mahboob/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.2.0.jar:/home/mahboob/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar: -d WC-classes | |
--> javac -cp /home/mahboob/hadoop/share/hadoop/common/hadoop-common-2.2.0.jar:/home/mahboob/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.2.0.jar:/home/mahboob/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar: -d WC-classes WordMapper.java | |
--> javac -cp /home/mahboob/hadoop/share/hadoop/common/hadoop-common-2.2.0.jar:/home/mahboob/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.2.0.jar:/home/mahboob/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar: -d WC-classes SumReducer.java | |
--> javac -cp /home/mahboob/hadoop/share/hadoop/common/hadoop-common-2.2.0.jar:/home/mahboob/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.2.0.jar:/home/mahboob/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar: -d WC-classes WordCount.java | |
/home/mahboob/hadoop/share/hadoop/common/hadoop-common-2.2.0.jar(org/apache/hadoop/fs/Path.class): warning: Cannot find annotation method 'value()' in type 'LimitedPrivate': class file for org.apache.hadoop.classification.InterfaceAudience not found | |
1 warning | |
--> mahboob@mahboob-CQ45-nb:~/Code/java/BookText$ jar -cvf WordCount.jar -C WC-classes/ . | |
added manifest | |
adding: WordCount.class(in = 1694) (out= 854)(deflated 49%) | |
adding: WordMapper.class(in = 1681) (out= 733)(deflated 56%) | |
adding: SumReducer.class(in = 1690) (out= 712)(deflated 57%) | |
--> ls | |
SumReducer.java WC-classes WC-input WordCount.jar WordCount.java WordMapper.java | |
--> ls WC-input | |
davinci.txt The-Outline-Of-Science.txt Ulysses.txt | |
--> hdfs namenode -format | |
--> start-dfs.sh && start-yarn.sh | |
--> jps | |
10893 DataNode | |
11442 NodeManager | |
11292 ResourceManager | |
10742 NameNode | |
11484 Jps | |
11119 SecondaryNameNode | |
--> hdfs dfs -rm -r data/WC-input | |
--> hdfs dfs -rm -r data/WC-output | |
--> hdfs dfs -mkdir -p data/WC-input | |
(mh-note: in hdfs, not os fs, directory created as /user/mahboob/data/WC-input) | |
--> hdfs dfs -copyFromLocal WC-input/* data/WC-input | |
--> hdfs dfs -ls data/WC-input | |
14/09/26 13:09:34 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable | |
Found 3 items | |
-rw-r--r-- 1 mahboob supergroup 674570 2014-09-26 12:57 data/WC-input/The-Outline-Of-Science.txt | |
-rw-r--r-- 1 mahboob supergroup 1573150 2014-09-26 12:57 data/WC-input/Ulysses.txt | |
-rw-r--r-- 1 mahboob supergroup 1423803 2014-09-26 12:57 data/WC-input/davinci.txt | |
--> hadoop jar WordCount.jar WordCount data/WC-input data/WC-output | |
URLs: | |
http://localhost:8088/cluster | |
http://localhost:50070/dfshealth.jsp | |
--> stop-dfs.sh && stop-yarn.sh | |
Step 3 | |
------ | |
mahboob@mahboob-CQ45-nb:~/Code/java/hadoop/marketratings$ pwd | |
/home/mahboob/Code/java/hadoop/marketratings | |
--> ls | |
marketratings.csv MarketRatings.java | |
--> javac -cp /home/mahboob/hadoop/share/hadoop/common/hadoop-common-2.2.0.jar:/home/mahboob/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.2.0.jar:/home/mahboob/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar: -d classes MarketRatings.java | |
--> jar -cvf MarketRatings.jar -C classes/ . | |
--> start-dfs.sh && start-yarn.sh | |
--> jps | |
--> hdfs dfs -mkdir -p data/MR-input | |
--> hdfs dfs -copyFromLocal marketratings.csv data/MR-input | |
--> hdfs dfs -ls data/MR-input | |
14/09/26 23:52:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable | |
Found 1 items | |
-rw-r--r-- 1 mahboob supergroup 1951777 2014-09-26 23:51 data/MR-input/marketratings.csv | |
--> hadoop jar MarketRatings.jar MarketRatings data/MR-input data/MR-output | |
--> stop-dfs.sh && stop-yarn.sh | |
Step 4 | |
------ | |
mahboob@mahboob-CQ45-nb:~/Code/Python/hadoop$ pwd | |
/home/mahboob/Code/Python/hadoop | |
mahboob@mahboob-CQ45-nb:~/Code/Python/hadoop$ ls | |
mapper.py reducer.py | |
--> start-dfs.sh && start-yarn.sh | |
--> jps | |
--> hdfs dfs -ls | |
--> hdfs dfs -ls data | |
--> hdfs dfs -rm -r data/WCpy-output | |
--> mahboob@mahboob-CQ45-nb:~/Code/Python/hadoop$ ls | |
mapper.py mapper.py.first reducer.py reducer.py.first | |
[my note : the first version of mapper.py and reducer.py on Michael Noll's site ran the job successully but generated an empty output file. So I copied the second version] | |
--> hadoop jar ~/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar -file ./mapper.py -mapper ./mapper.py -file ./reducer.py -reducer ./reducer.py -input data/WC-input/* -output data/WCpy-output | |
Step 5 | |
------ | |
--> mahboob@mahboob-CQ45-nb:~/Code/java/hadoop/linearregression$ pwd | |
/home/mahboob/Code/java/hadoop/linearregression | |
--> mahboob@mahboob-CQ45-nb:~/Code/java/hadoop/linearregression$ ls | |
Participant.java Projection.java ProjectionMapper.java ProjectionReducer.java | |
--> mkdir -p REG-classes | |
--> javac -cp /home/mahboob/hadoop/share/hadoop/common/hadoop-common-2.2.0.jar:/home/mahboob/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.2.0.jar:/home/mahboob/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar: -d REG-classes Participant.java | |
--> javac -cp /home/mahboob/hadoop/share/hadoop/common/hadoop-common-2.2.0.jar:/home/mahboob/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.2.0.jar:/home/mahboob/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar: -d REG-classes ProjectionMapper.java | |
--> javac -cp /home/mahboob/hadoop/share/hadoop/common/hadoop-common-2.2.0.jar:/home/mahboob/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.2.0.jar:/home/mahboob/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar:REG-classes -d REG-classes ProjectionReducer.java | |
--> javac -cp /home/mahboob/hadoop/share/hadoop/common/hadoop-common-2.2.0.jar:/home/mahboob/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.2.0.jar:/home/mahboob/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar:REG-classes -d REG-classes Projection.java | |
/home/mahboob/hadoop/share/hadoop/common/hadoop-common-2.2.0.jar(org/apache/hadoop/fs/Path.class): warning: Cannot find annotation method 'value()' in type 'LimitedPrivate': class file for org.apache.hadoop.classification.InterfaceAudience not found | |
Note: Projection.java uses or overrides a deprecated API. | |
Note: Recompile with -Xlint:deprecation for details. | |
1 warning | |
--> jar -cvf Projection.jar -C REG-classes/ . | |
--> start-dfs.sh && start-yarn.sh | |
--> jps | |
4884 Jps | |
4105 NameNode | |
4834 NodeManager | |
4482 SecondaryNameNode | |
4259 DataNode | |
4685 ResourceManager | |
--> hdfs dfs -mkdir -p data/REG-input | |
--> mahboob@mahboob-CQ45-nb:~/Code/java/hadoop/linearregression$ ls | |
Participant.java Projection.jar Projection.java ProjectionMapper.java ProjectionReducer.java REG-classes RegScore.txt | |
--> hdfs dfs -copyFromLocal RegScore.txt data/REG-input | |
--> hdfs dfs -ls data/REG-input | |
14/09/29 17:35:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable | |
Found 1 items | |
-rw-r--r-- 1 mahboob supergroup 114 2014-09-29 17:35 data/REG-input/RegScore.txt | |
--> hadoop jar Projection.jar com.rukbysoft.examples.regressionMR.Projection data/REG-input data/REG-output |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment