Skip to content

Instantly share code, notes, and snippets.

@jerluc
Last active December 21, 2015 23:19
Show Gist options
  • Save jerluc/6381647 to your computer and use it in GitHub Desktop.
Save jerluc/6381647 to your computer and use it in GitHub Desktop.
Hadoop CDH3 setup

What you need

By now, you should have the following prerequisites installed and configured on your machine:

  • Ubuntu 12.04
  • Java (Oracle JDK 6; ensure $JAVA_HOME is set)
  • Git
  • Apache Maven

What you get

After successful installation, you should be setup for running:

  • Hadoop (local-mode + pseudo-distributed)

Installing Hadoop

Copy the below shell commands into a file ~/hadoop-install.sh (note that for this project, we will be using Cloudera's CDH3 distribution of Hadoop for the sake of consistency):

    # Download the Cloudera CDH3 Debian package descriptor
wget http://archive.cloudera.com/one-click-install/maverick/cdh3-repository_1.0_all.deb
dpkg -i ./cdh3-repository_1.0_all.deb

# Add the Squeeze main source to the apt source listing
touch /etc/apt/sources.list
echo '
deb http://ftp.us.debian.org/debian squeeze main' >> /etc/apt/sources.list

# Add the CDH3 sources to the apt source listing
touch /etc/apt/sources.list.d/cloudera.list
echo '
deb http://archive.cloudera.com/debian maverick-cdh3 contrib' >> /etc/apt/sources.list.d/cloudera.list
echo '
deb-src http://archive.cloudera.com/debian maverick-cdh3 contrib' >> /etc/apt/sources.list.d/cloudera.list

# Update the apt caches
apt-get -y --force-yes update

# Install libzip1
apt-get -y --force-yes install libzip1

# Install Hadoop Core/Native
apt-get -y --force-yes install hadoop-0.20 hadoop-0.20-native

# Install Hadoop daemons
apt-get -y --force-yes install hadoop-0.20-namenode hadoop-0.20-datanode hadoop-0.20-jobtracker hadoop-0.20-tasktracker

Once copied, run the following command to make the shell script executable (this command will prompt you for your password, as it requires root user authentication):

sudo chmod 775 ~/hadoop-install.sh

Next, execute the shell script as root to install Hadoop and Pig:

sudo ~/hadoop-install.sh

You will then notice a whole flurry of things going on as packages are installed and dependencies are checked, etc. This is normal :)

Verifying your Hadoop installation

Once you have run the above commands to install Hadoop, you can verify that the installation was successful by running the built-in example MapReduce jobs provided during the Hadoop installation.

For testing your installation, we will create sample input files, and then run a MapReduce job that performs a distributed form of the Unix grep command. Copy the following commands into your terminal:

# Testing Hadoop installation
mkdir input
echo 'hello
world' > input/file.txt
echo 'hi
world' > input/file2.txt
hadoop jar /usr/lib/hadoop-0.20/hadoop-*-examples.jar grep input output '^h' 

If the installation is correct, and the job succeeds, you should find a file named output/part-0000 in the current directory with the contents:

2	h
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment