Chef-bach can be used to create a hadoop test cluster using virtual machines on an hypervisor host with enough resources. The resulting cluster will be a 4 node cluster with one of the nodes acting as the bootstrap node which will host a chef server.The other three nodes will be hadoop nodes. 2 out of 3 nodes will be master nodes and one node will be the worker node. The following are the steps to go about creating the test cluster. This has been tested on hypervisor hosts running Mac OS and Ubuntu.
curlon the hypervisor host
virtualboxon the hypervisor host
vagranton the hypervisor host
- Delete the default
DHCP serverinbuilt in
sudo pkill -f VBoxon the hypervisor host
- Clone chef-bach repository onto the hypervisor host
git clone https://github.com/bloomberg/chef-bach.git
chef-bcpcdirectory on the hypervisor host
- cd to
chef-bcpcdirectory on the hypervisor host
- Run the auto installation script under the test directory
- This will download all the required software, creates the four node cluster and installs all the HDP hadoop components. As you can imaging this takes sometime. Depending on the size of the hypervisor host, network bandwidth etc it can take 2 to 3 hrs to complete.
- Once the automated_install.sh is complete logon to the bootstrap node. You need to be in the chef-bcpc directory on the hypervisor
- Once logged onto the bootstrap node, cd to
- Then run the following set of commands twice in sequence
./cluster-assign-roles.sh Test-Laptop hadoop bcpc-vm1 ./cluster-assign-roles.sh Test-Laptop hadoop bcpc-vm2 ./cluster-assign-roles.sh Test-Laptop hadoop bcpc-vm3
- This completes the creation of the three hadoop nodes
bcpc-vm1is a master node which hosts HDFS Namenode, HBase master, MySql server and the ip is 10.0.100.11
bcpc-vm2is a master node which hosts YARN resource manager and Hive/Hcatalog, MySql Server and the ip is 10.0.100.12
bcpc-vm3is the worker node which hosts HDFS Datanode, HBase region server, YARN node manager and the ip is 10.0.100.13
- System stats from the nodes and JMX stats from the various hadoop components are available through graphite. The URL to access is
- Monitoring of hadoop components are done through Zabbix and it can be accessed through the URL
- Passwords for various components including the password to login to the hadoop nodes can be retrieved by logging on to the bootstrap node and issuing the follwing command
chef-bcpcdirectory in hypervisor
vagrant ssh cd chef-bcpc sudo knife data bag show configs Test-Laptop
This will list the user-id and password of all the components. Node that the cobbler-root-password: is the password to logon to the 3 hadoop nodes as the user "ubuntu" which is part of sudoers list.
Verifying the hadoop test cluster
- Log on to
bcpc-vm3. You can do
ssh email@example.com the hypervisor or from the bootstrap node
./nodessh.sh Test-Laptop 10.0.100.13 -
- Switch to
hdfs dfs -copyFromLocal /etc/passwd /passwd
hdfs dfs -cat /passwd
hdfs dfs -rm /passwd
- If all these are successful the hdfs component is verified
- Under the hbase shell, run
listwhich should display the newly created table as a list
scan 't1'which should display the row create in the previous step
listand it should display an empty list
- If all these steps are complete, the HBase component is verified along with ZooKeeper
- As HDFS user, run
hdfs dfs -chmod 777 /user. Note that since this is a test cluster we are doing it. Do not perform this step in other secured environment
- Create a new user in all the three hadoop nodes using
- Login to the bcpc-vm2 (10.0.100.12) node and switch to the new user created in the previous step.
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-18.104.22.168.0.11.0-1.jar pi 1 100
- If the previous step completes successfully it verifies YARN and MapReduce components
- If you plan to use Hive, being on bcpc-vm2 hadoop node bring up
- Hive shell by running
- Create a table.
create table t1 (id int)
- Describe the newly created table.
- Drop the newly created table.
drop table t1
- If these steps are successful, it verifies the
If the test cluster is created on a hypervisor host located behind a firewall appropriate proxy and DNS servers need to be set in
PROXY=proxy.example.com:80 DNS_SERVERS='"22.214.171.124", "126.96.36.199"'