Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save theamith/5049011 to your computer and use it in GitHub Desktop.
Save theamith/5049011 to your computer and use it in GitHub Desktop.
Running Hadoop on Ubuntu Linux(Single Node Cluster):
1. Prerequisites
Java
User for hadoop
SSH
IPv6
1.1 Setup and Check Java :
# Install Sun Java JDK
$sudo apt-get install openjdk-7-jdk
# full JDK will be placed in /usr/lib/jvm/java-7-openjdk-i386
# After installation check whether java is installed correctly
user@ubuntu:~# java -version
1.2. Add and Check a dedicated Hadoop System User :
# Add a group named hadoop
$ sudo addgroup hadoop
# Add user hduser in group hadoop
$ sudo adduser --ingroup hadoop hduser
# switch to newly added user
$su - hduser
1.3. Generate Secure Shell or SSH :
# Hadoop requires SSH access to manage its nodes
# For our single-node setup of Hadoop, we therefore need to configure SSH access to localhost for the hduser user we created in the previous section.
# If SSH is not up and not configured to allow SSH public key authentication then follow steps in http://ubuntuguide.org/wiki/Ubuntu_Quantal_Remote_Access#Remote_Access
# switch user to hduser
user@ubuntu:~$ su - hduser
# generate ssh key for hduser with empty password
hduser@ubuntu:~$ ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hduser/.ssh/id_rsa):
Created directory '/home/hduser/.ssh'.
Your identification has been saved in /home/hduser/.ssh/id_rsa.
Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.
The key fingerprint is:
9b:82:ea:58:b4:e0:35:d7:ff:19:66:a6:ef:ae:0e:d2 hduser@ubuntu
The key's randomart image is:
[...snipp...]
hduser@ubuntu:~$
#change directory to ssh
$cd ~/.ssh
#enable SSH access to your local machine with this newly created key.
hduser@ubuntu:~$ cat id_rsa.pub >> authorized_keys
$sudo apt-get install openssh-server
# During this step if there occurs an exception saying “hduser is not in sudoers file. This incident will be reported.” Do the following:
$ cd /etc
$sudo gedit sudoers
In that file add the following line below “# User privilege specification”
hduser ALL=(ALL) ALL
save the file and exit.
# The final step is to test the SSH setup by connecting to your local machine with the hduser user.
# The step is also needed to save your local machine’s host key fingerprint to the hduser user’s known_hosts file
# If you have any special SSH configuration for your local machine like a non-standard SSH port, you can define host-specific SSH options in $HOME/.ssh/config (see man ssh_config for more information).
hduser@ubuntu:~$ ssh localhost
The authenticity of host 'localhost (::1)' can't be established.
RSA key fingerprint is d7:87:25:47:ae:02:00:eb:1d:75:4f:bb:44:f9:36:26.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
Linux ubuntu 2.6.32-22-generic #33-Ubuntu SMP Wed Apr 28 13:27:30 UTC 2010 i686 GNU/Linux
Ubuntu 10.04 LTS
[...snipp...]
hduser@ubuntu:~$
# If the SSH connect should fail, these general tips might help:
# Enable debugging with ssh -vvv localhost and investigate the error in detail.
Check the SSH server configuration in /etc/ssh/sshd_config, in particular the options PubkeyAuthentication (which should be set to yes) and AllowUsers (if this option is active, add the hduser user to it). If you made any changes to the SSH server configuration file, you can force a configuration reload with sudo /etc/init.d/ssh reload.
1.4. Disabling IPV6:
#You can also disable IPv6 only for Hadoop by adding the following line to conf/hadoop-env.sh
export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
Or
#disable IPV6 for hadoop user
#open sysctl.conf file in pico editor
$sudo pico /etc/sysctl.conf
#change the following contents
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
#reboot the system
$sudo reboot
#check whether IPV6 is diabled
$cat /proc/sys/net/ipv6/conf/all/disable_ipv6
0-enabled; 1-disabled
2.Hadoop
Installation
Cfg
Start & Stop
Run MapReduce
2.1. Installation:
# Download Hadoop from the Apache Download Mirrors
# change to local directory
$ cd /usr/local
# unzip the hadoop tar file to /usr/local
$ sudo tar xzf hadoop-1.0.3.tar.gz
# move unzipped file to folder named hadoop
$ sudo mv hadoop-1.0.3 hadoop
# change the ownership to hduser
$ sudo chown -R hduser:hadoop hadoop
2.2.1. Configure ~/.bashrc :
# switch to hduser
$su -hduser
# open bashrc in pico editor
$pico ~/.bashrc
#Add to the end of the ~/.bashrc
# Set Hadoop-related environment variables
export HADOOP_HOME=/usr/local/hadoop
# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386
# Some convenient aliases and functions for running Hadoop-related commands
unalias fs &> /dev/null
alias fs="hadoop fs"
unalias hls &> /dev/null
alias hls="fs -ls"
# If you have LZO compression enabled in your Hadoop cluster and
# compress job outputs with LZOP (not covered in this tutorial):
# Conveniently inspect an LZOP compressed file from the command
# line; run via:
#
# $ lzohead /hdfs/path/to/lzop/compressed/file.lzo
#
# Requires installed 'lzop' command.
#
lzohead () {
hadoop fs -cat $1 | lzop -dc | head -1000 | less
}
# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin
2.2.2. Setting environment variable for JAVA_HOME :
# find the path where java is installed
$whereis java
$ls -l /usr/bin/java
$ls -l /etc/alternatives/java
# open haddop-env.sh file to set envt var for JAVA_HOME
$pico /usr/local/hadoop/conf/hadoop-env.sh
Change
#export JAVA_HOME=/usr/lib/j2sdk1.5-sun
To
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386
2.2.2. Create dir Set ownership and permission :
# make directory and change permission to hduser
$ sudo mkdir -p /app/hadoop/tmp
$ sudo chown hduser:hadoop /app/hadoop/tmp
# ...and if you want to tighten up security, chmod from 755 to 750...
$ sudo chmod 750 /app/hadoop/tmp
2.2.4. Configuration- conf/*-site.xml :
# Add the following snippets between the <configuration> ... </configuration> tags in the conf/core-site.xml file.
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
# In file conf/mapred-site.xml:
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
# In file conf/hdfs-site.xml:
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
2.2.5. Formatting HDFS via Name Node :
# To format the filesystem (which simply initializes the directory specified by the dfs.name.dir variable), run the command
hduser@ubuntu:~$ /usr/local/hadoop/bin/hadoop namenode -format
The output will look like this:
hduser@ubuntu:/usr/local/hadoop$ bin/hadoop namenode -format
10/05/08 16:59:56 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = ubuntu/127.0.1.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
10/05/08 16:59:56 INFO namenode.FSNamesystem: fsOwner=hduser,hadoop
10/05/08 16:59:56 INFO namenode.FSNamesystem: supergroup=supergroup
10/05/08 16:59:56 INFO namenode.FSNamesystem: isPermissionEnabled=true
10/05/08 16:59:56 INFO common.Storage: Image file of size 96 saved in 0 seconds.
10/05/08 16:59:57 INFO common.Storage: Storage directory .../hadoop-hduser/dfs/name has been successfully formatted.
10/05/08 16:59:57 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.1.1
************************************************************/
hduser@ubuntu:/usr/local/hadoop$
2.3.1. Starting Single Node Cluster :
# To start Namenode, Datanode, Jobtracker and a Tasktracker on your machine runs this command
hduser@ubuntu:~$ /usr/local/hadoop/bin/start-all.sh
The output will look like this:
hduser@ubuntu:/usr/local/hadoop$ bin/start-all.sh
starting namenode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-namenode-ubuntu.out
localhost: starting datanode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-datanode-ubuntu.out
localhost: starting secondarynamenode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-secondarynamenode-ubuntu.out
starting jobtracker, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-jobtracker-ubuntu.out
localhost: starting tasktracker, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-tasktracker-ubuntu.out
hduser@ubuntu:/usr/local/hadoop$
2.3.2. Stopping Single Node Cluster :
hduser@ubuntu:~$ /usr/local/hadoop/bin/stop-all.sh
Output:
hduser@ubuntu:/usr/local/hadoop$ bin/stop-all.sh
stopping jobtracker
localhost: stopping tasktracker
stopping namenode
localhost: stopping datanode
localhost: stopping secondarynamenode
hduser@ubuntu:/usr/local/hadoop$
2.4 Running MapReduce Job :
$./hadoop dfs -put /path-of-os-filesystem/test /test
$./hadoop dfs -ls /test
$..hadoop jar ../hadoop-examples-1.0.1.jar wordcount /test/out
$./hadoop dfs -lsr /test/out
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment