Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save piyasde/680e2066fce63df2f102 to your computer and use it in GitHub Desktop.
Save piyasde/680e2066fce63df2f102 to your computer and use it in GitHub Desktop.
Hadoop 2.6.0 Multi Node Installation and Test Run in Ubuntu 14.04
1. You can follow https://gist.github.com/piyasde/17d4f7bc97c0f0820d40 for single node setup for all the machines in cluster
2. Make a user group for Hadoop Users
3. Make same user for hadoop in all the machines where the hadoop setup will be done
4. Select a machine which will be master node.
On that machine vi /etc/hosts and add the ip a name as master not localhost.
vi /etc/hosts
#127.0.0.1 localhost
#127.0.1.1 ubupc1
192.13.171.58 master
192.113.171.43 slave
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
5. Same changes are to be done, for all the slave node /etc/hosts files
6.Disable SSH password from master to slave machines and vice versa. by using
On master machine run
$ ssh-copy-id -i ~/.ssh/id_rsa.pub slave
On slave
$ ssh-copy-id -i ~/.ssh/id_rsa.pub master
7.On master and slave machine, add master(ip) and slave(ip) in the slaves file. and also ip of any slave machines(node).
##Entry for all the slave nodes are required in the slaves file.
vi <<Your hadoop installation directory>>/etc/hadoop/slaves
master
salve
#####5.open conf/*-site.xml (all machines)
####-core-site.xml
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
<description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation.The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description>
</property>
####-mapred-site.xml(for hadoop 2.x)
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
####-yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
#### In hdfs-site.xml we have made dfs.replication as 2 as we have only 2 slaves
####-hdfs-site.xml for master
<property>
<name>dfs.replication</name>
<value>2</value>
<description>Default block replication.The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.
</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/<<hadoop installation directory or any directory we want>>/hadoop_data/hdfs/namenode</value>
</property>
####-hdfs-site.xml for slave
<property>
<name>dfs.replication</name>
<value>2</value>
<description>Default block replication.The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.
</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/<<hadoop installation directory or any directory we want>>/hadoop_data/hdfs/namenode</value>
</property>
#### Point to remember -
These directories are needed to be given access to the hadoop user by using chown command and for file read/write access we need to use chmod command
8.Delete any files in the hdtmp folder on all machines
7.On master run
hadoop namenode -format
8.On master
start-dfs.sh
confirm using jps
jps
799 NameNode
15314 Jps
14880 DataNode
14977 SecondaryNameNode
9.On master
start-yarn.sh// hadoop 2.x
//will start 1 Nodemanager per cluster and starts
ResourceManager
Nodemanager
10. Some trobleshootings -
A> check log files.
B> If we find any exception in log, we should go to hdtmp/dfs and then manually delete all the files under data and name folder.
C> Stop all the processes using stop-dfs.sh and stop-yarn.sh
D> Format namenode using - $bin/hadoop namenode -format
E> Execute start-dfs.sh and start-yarn.sh which are in sbin folder of the hadoop installation directory.
F> If everything goes fine, we should be able to see the datanodes listing in the
http://<<IP of the namenode machine>>:50070
G> If we find any error like 'operation not permitted', we should check the permission of the folders for hadoop user for the respective machine
To check the map-reduce job, follow the steps -
$bin/hadoop fs -mkdir /hdinput // created directory within hdfs
$bin/hdfs dfs -copyFromLocal etc/hadoop/*.xml /hdinput
// We will be able to see files from http://localhost:50070/explorer.html#/hdinput - which are in hdfs
//Run the job
$bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar grep /hdinput hdlatestoutput 'dfs[a-z.]+'
// Copy output to local directory
bin/hdfs dfs -copyToLocal /user/<<your username>>/hdlatestoutput <<your expected directory>>/output
11. Desired result will be -
_SUCCESS
part-r-00000
part-r-00000 will have result output.
which is -
1 dfsadmin
1 dfs.replication
1 dfs.namenode.name.dir
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment