Skip to content

Instantly share code, notes, and snippets.

What would you like to do?

Chef-bach can be used to create a hadoop test cluster using virtual machines on an hypervisor host with enough resources. The resulting cluster will be a 4 node cluster with one of the nodes acting as the bootstrap node which will host a chef server.The other three nodes will be hadoop nodes. 2 out of 3 nodes will be master nodes and one node will be the worker node. The following are the steps to go about creating the test cluster. This has been tested on hypervisor hosts running Mac OS and Ubuntu.

  • Install curl on the hypervisor host
  • Install virtualbox on the hypervisor host
  • Install vagrant on the hypervisor host
  • Delete the default DHCP server inbuilt in virtualbox
  • Run sudo pkill -f VBox on the hypervisor host
  • Clone chef-bach repository onto the hypervisor host git clone
  • rename chef-bach to chef-bcpc directory on the hypervisor host
  • cd to chef-bcpc directory on the hypervisor host
  • Run the auto installation script under the test directory ./tests/
  • This will download all the required software, creates the four node cluster and installs all the HDP hadoop components. As you can imaging this takes sometime. Depending on the size of the hypervisor host, network bandwidth etc it can take 2 to 3 hrs to complete.
  • Once the is complete logon to the bootstrap node. You need to be in the chef-bcpc directory on the hypervisor vagrant ssh
  • Once logged onto the bootstrap node, cd to chef-bcpc directory
  • Then run the following set of commands twice in sequence

./ Test-Laptop hadoop bcpc-vm1 
./ Test-Laptop hadoop bcpc-vm2 
./ Test-Laptop hadoop bcpc-vm3 

  • This completes the creation of the three hadoop nodes bcpc-vm1 is a master node which hosts HDFS Namenode, HBase master, MySql server and the ip is bcpc-vm2 is a master node which hosts YARN resource manager and Hive/Hcatalog, MySql Server and the ip is bcpc-vm3 is the worker node which hosts HDFS Datanode, HBase region server, YARN node manager and the ip is
  • System stats from the nodes and JMX stats from the various hadoop components are available through graphite. The URL to access is
  • Monitoring of hadoop components are done through Zabbix and it can be accessed through the URL
  • Passwords for various components including the password to login to the hadoop nodes can be retrieved by logging on to the bootstrap node and issuing the follwing command From chef-bcpc directory in hypervisor
vagrant ssh
cd chef-bcpc
sudo knife data bag show configs Test-Laptop 

This will list the user-id and password of all the components. Node that the cobbler-root-password: is the password to logon to the 3 hadoop nodes as the user "ubuntu" which is part of sudoers list.

Verifying the hadoop test cluster

  • Log on to bcpc-vm3. You can do ssh ubuntu@ from the hypervisor or from the bootstrap node chef-bcpc directory issue ./ Test-Laptop -
  • Switch to hdfs user
  • Run hdfs dfs -copyFromLocal /etc/passwd /passwd
  • Run hdfs dfs -cat /passwd
  • Run hdfs dfs -rm /passwd
  • If all these are successful the hdfs component is verified
  • Run hbase shell
  • Under the hbase shell, run create 't1','cf1'
  • Run list which should display the newly created table as a list
  • Run put 't1','r1','cf1:c1','v1'
  • Run scan 't1' which should display the row create in the previous step
  • Run disable 't1'
  • Run drop 't1'
  • Run list and it should display an empty list
  • Run exit
  • If all these steps are complete, the HBase component is verified along with ZooKeeper
  • As HDFS user, run hdfs dfs -chmod 777 /user. Note that since this is a test cluster we are doing it. Do not perform this step in other secured environment
  • Create a new user in all the three hadoop nodes using adduser command
  • Login to the bcpc-vm2 ( node and switch to the new user created in the previous step.
  • Run hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples- pi 1 100
  • If the previous step completes successfully it verifies YARN and MapReduce components
  • If you plan to use Hive, being on bcpc-vm2 hadoop node bring up
  • Hive shell by running hive
  • Create a table. create table t1 (id int)
  • Describe the newly created table. describe t1
  • Drop the newly created table. drop table t1
  • If these steps are successful, it verifies the Hive component

If the test cluster is created on a hypervisor host located behind a firewall appropriate proxy and DNS servers need to be set in script
DNS_SERVERS='"", ""'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment