Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save prateek/11162782 to your computer and use it in GitHub Desktop.
Save prateek/11162782 to your computer and use it in GitHub Desktop.

CDH5 Cluster Setup

These are the steps I followed to setup a 6.5 CentOS VM, and install CDH5 and CM5 on it. All these commands should be run on a single node if running on a cluster, it will serve as the master node.

Caveats:

  • This is going to use the embedded postgres db for the services, this is a TERRIBLE idea if the enviornment is anything but short lived POC.
  • This was done for a 4 node POC cluster where a single instance was going to be the dedicated master - all the CM management and Hadoop master daemons would run on it. And the 3 remaining nodes would be data nodes.

Steps to follow

  1. Create new local CentOS 6.5 Image based on this ISO.

  2. Disable SELINUX on all nodes!

sudo setenforce 0
# change /etc/selinux/config to be `disabled`
  1. Perform other OS level checks (for eg: DNS resolution, NTP daemons, HD format options, etc) - on all nodes. Eg:
  • Make sure bi-directional hostname resoultion works, i.e., run this command
python -c "import socket; print socket.getfqdn(); print socket.gethostbyname(socket.getfqdn())"

# It should output something like
nightly46-1.ent.cloudera.com
10.20.194.216

If using /etc/hosts, follow the following format i.e. 10.10.10.1 hadoop1.example.com hadoop1

  1. Install the 1.7 jdk (only on the master)

sudo yum localinstall

  1. Move all existing repos
sudo cp -R /etc/yum.repos.d/ /etc/yum.repos.backup
sudo rm -f /etc/yum.repos.d/*
  1. Setup CM Repo
$ mkdir -p /tmp/cloudera-repo/cm
$ cd /tmp/cloudera-repo/cm
$ wget -r http://archive-primary.cloudera.com/cm5/redhat/6/x86_64/cm/5.0.0/
$ python -m SimpleHTTPServer 8080 &
# If you can't wget, copy to a secondary box and SCP over

$ sudo cat <<EOF > /etc/yum.repos.d/cm.repo
[cm]
name=cloudera-manager
baseurl=http://127.0.0.1:8080/archive-primary.cloudera.com/cm5/redhat/6/x86_64/cm/5.0.0
enabled=1
gpgcheck=0
EOF
# Make sure the links above resolve to the appropriate files (use a web browser or curl)

Create a Parcels repo

$ mkdir -p /tmp/cloudera-repo/cdh
$ cd /tmp/cloudera-repo/cdh
$ wget http://archive-primary.cloudera.com/cdh5/parcels/latest/CDH-5.0.0-1.cdh5.0.0.p0.47-el6.parcel
$ wget http://archive-primary.cloudera.com/cdh5/parcels/latest/manifest.json
$ python -m SimpleHTTPServer 8081 &

Get the postgres updates

$ mkdir -p /tmp/postgres-update
$ cd /tmp/postgres-update
$ wget http://mirrors-pa.sioru.com/centos/6.5/updates/x86_64/Packages/postgresql-8.4.20-1.el6_5.x86_64.rpm
$ wget http://mirror.hmc.edu/centos/6.5/updates/x86_64/Packages/postgresql-libs-8.4.20-1.el6_5.x86_64.rpm
$ wget http://mirror.wiredtree.com/centos/6.5/updates/x86_64/Packages/postgresql-server-8.4.20-1.el6_5.x86_64.rpm
$ sudo yum localinstall postgresql-*

Create a local mirror for OS repo

Follow instructions here - http://wiki.centos.org/HowTos/CreateLocalMirror Only need to have the os folder served as a repository, the remaining are not required.

(This can be done using nothing but the vanilla ISO/Disk, there isn't a need to run the the rsync across the nodes) Here are the commands I used, they'll need to be updated depending on the OS / locations of files

mkdir -p /share/CentOS/6.5/os/x86_64
cd /share/CentOS
ln -s 6.5 6
mkdir /tmp/mnt
mount -ro loop <PATH-TO-ISO-1-OF-2.iso> /tmp/mnt
rsync -avHPS /tmp/mnt /share/CentOS/6.5/os/x86_64
umount /tmp/mnt
mount -ro loop <PATH-TO-ISO-2-OF-2.iso> /tmp/mnt
rsync -avHPS /tmp/mnt /share/CentOS/6.5/os/x86_64
umount /tmp/mnt
cd /share/CentOS
python -m SimpleHTTPServer 8090 &

# Create repo
$ sudo cat <<EOF > /etc/yum.repos.d/os.repo
[centos-os]
name=CentOS 6.5
baseurl=http://127.0.0.1:8090/6.5/oa/x86_64
enabled=1
gpgcheck=0

Install the manager daemons

sudo yum install cloudera-manager-daemons cloudera-manager-server

Install the db daemons

sudo yum install cloudera-manager-server-db-2

Start the db

sudo service cloudera-scm-server-db start

start Cloudera Manager's server

sudo service cloudera-scm-server start

wait a bit, navigate to localhost:7180 and follow instructions there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment