Skip to content

Instantly share code, notes, and snippets.

@epishkin
Last active December 27, 2015 16:49
Show Gist options
  • Save epishkin/7357437 to your computer and use it in GitHub Desktop.
Save epishkin/7357437 to your computer and use it in GitHub Desktop.
Hadoop Howto

Howto setup Cloudera Sandbox

Download hadoop & oozie

  1. Download tarballs of hadoop and oozie from http://www.cloudera.com/content/dev-center/en/home/developer-admin-resources/cdh-components.html
  2. extract into ~/opt/ so you should have them in ~/opt/hadoop-2.0.0-cdh4.4.0 and ~/opt/oozie-3.3.2-cdh4.4.0

Update env variables

in ~/.bash_login or ~/.bash_profile

export HDP_HOME=$HOME/opt/hadoop-2.0.0-cdh4.4.0
export HADOOP_HOME=$HDP_HOME/share/hadoop/mapreduce1
export HADOOP_CONF_DIR=${HDP_HOME}/etc/hadoop
 
export OOZIE_HOME=$HOME/opt/oozie-3.3.2-cdh4.4.0 
export OOZIE_URL=http://localhost:11000/oozie/
export OOZIE_TIMEZONE=America/New_York
 
export PATH="$HDP_HOME/bin-mapreduce1:$PATH"
export PATH="$OOZIE_HOME/bin:$PATH"
 
function hadoop-node02() {
  hadoop --config ${HDP_HOME}/etc/hadoop-node02 "$@"
}
 
function oozie-node02() {
  oozie "$@" -oozie http://production.cluster.host:11000/oozie/
}

Create 2 separate config sets for Cloudera VM and the production cluster

cp -R ${HDP_HOME}/etc/hadoop ${HDP_HOME}/etc/hadoop-node02
cp ${HDP_HOME}/etc/hadoop-node02/mapred-site.xml.template ${HDP_HOME}/etc/hadoop-node02/mapred-site.xml
 
cp ${HDP_HOME}/etc/hadoop/mapred-site.xml.template ${HDP_HOME}/etc/hadoop/mapred-site.xml

Configure namenode & job tracker for production cluster

Modify ${HDP_HOME}/etc/hadoop-node02/core-site.xml

<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://production.cluster.host:8020</value>
  </property>
</configuration>

Modify ${HDP_HOME}/etc/hadoop-node02/mapred-site.xml

<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>production.cluster.host:8021</value>
  </property>
</configuration> 

Configure namenode & job tracker for Cloudera VM

Modify ${HDP_HOME}/etc/hadoop/core-site.xml

 <configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:8020</value>
  </property>
</configuration>

Modify ${HDP_HOME}/etc/hadoop/mapred-site.xml

<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:8021</value>
  </property>
</configuration>

How to use

#List files in hdfs of your sandbox hadoop
hadoop fs -ls /

#List files from namenode02
hadoop-node02 fs -ls /tapad-data

#List 5 scheduled jobs in oozie at namenode02
oozie-node2 jobs -len 5 -jobtype coord

Cloudera QuickStart VM

  1. download from http://www.cloudera.com/content/support/en/downloads.html
  2. if anything asks for login/password try cloudera/cloudera

VMware sandbox setup

Port mapping

#put these settings into nat.conf on ***your laptop***
#make sure '192.168.170.128' is the IP of your box in vmware (run ifconfig)

sudo vi /Library/Preferences/VMware\ Fusion/vmnet8/nat.conf
 
[incomingtcp]
50010 = 192.168.170.128:50010
50020 = 192.168.170.128:50020
50030 = 192.168.170.128:50030
50060 = 192.168.170.128:50060
50070 = 192.168.170.128:50070
50075 = 192.168.170.128:50075
8020  = 192.168.170.128:8020
8021  = 192.168.170.128:8021
8888  = 192.168.170.128:8888
11000 = 192.168.170.128:11000
 
#now restart VMWare's NAT
sudo "/Applications/VMware Fusion.app/Contents/Library/vmnet-cli" --stop
sudo "/Applications/VMware Fusion.app/Contents/Library/vmnet-cli" --start

VirtualBox sandbox setup

you can follow instructions here: https://github.com/cloudera/cdk-examples/

Setup account

Create account for yourself on the sanbox

User name on your laptop and on the sandbox should match. Add to sudoers if you want.

sudo su -
useradd USERNAME
passwd USERNAME

Create user in HUE

Open HUE - http://localhost:8888/useradmin/ (cloudera/cloudera) and create hdfs account for yourself - User Admin -> Add User. Make sure 'Create user dir' is selected.

Create oozie workflows dir

#on the virtualbox execute
sudo su - hdfs
hadoop fs -mkdir /oozie/deployments
hadoop fs -chmod -R 777 /oozie/deployments
exit
 
#on your laptop as yourself execute
hadoop fs -mkdir /oozie/deployments/lib-scalding
@triplel
Copy link

triplel commented Nov 7, 2013

awesome stuff, marked!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment