Skip to content

Instantly share code, notes, and snippets.

Last active December 11, 2023 01:38
Show Gist options
  • Save addingama/f665914340ec26f7efa80e86f53622e1 to your computer and use it in GitHub Desktop.
Save addingama/f665914340ec26f7efa80e86f53622e1 to your computer and use it in GitHub Desktop.
Steps to install Hadoop 3.2.0 to Ubuntu Server

Hadoop 3.2.0 Multi Node Configuration

Please make sure you already have working hadoop installation on your first VM. If not please follow this installation guide

We will make 1 master and 1 slave hadoop cluster. We can create another VM by cloning existing VM.

Clone VM to create another node

  • Open Virtual Box and make sure the VM not running

  • Right click on the existing VM

  • Choose Clone option


  • You can give custom name.

  • When asked about Full Clone or Linked Clone, you can choose Full Clone if you have enough Storage.

  • Now we have 2 VM


Check IP for each VM

  • Run both of the VM

  • Check IP on both VM by running ifconfig -a


    Both of VM not assigned IP on the enp0s8 interface

  • Force enp0s8 interface to get an IP by running sudo dhclient enp0s8 on each VM in order (master to slave)


    In this case my master node has IP and slave node has

    To make the IP address persist, we can edit /etc/netplan/ and adjust the IP. You must use space instead of tab.


Update /etc/hostname on each VM

  • sudo nano /etc/hostname on each VM
  • set master.hadoop.dts on master node and slave1.hadoop.dts on slave node


Update /etc/hosts on each VM

  • sudo nano /etc/hosts

  • Add these lines master.hadoop.dts master.hadoop.dts


  • Restart networking

      sudo /etc/init.d/network-manager restart


Add Authorized Keys

All nodes need to communicate each other by ssh without password, the command below is used for copy user identity to all nodes each other. Therefore ssh connection between them does not need password anymore.

On master node

  su - hadoop
  ssh-copy-id -i ~/.ssh/ hadoop@slave1.hadoop.dts

On slave node

  su - hadoop
  ssh-copy-id -i ~/.ssh/ hadoop@master.hadoop.dts


Now you can test to ssh hadoop@slave1.hadoop.dts, there will be no password prompt


Setting Configuration Files

~/hadoop/etc/hadoop/workers (only in master node)

If you use hadoop version 2.* you need to change ~/hadoop/etc/hadoop/slaves


~/hadoop/etc/hadoop/core-site.xml (all VM)


~/hadoop/etc/hadoop/hdfs-site.xml (all VM)


~/hadoop/etc/hadoop/yarn-site.xml (all VM)


Formatting Hadoop File System (only in master)

  hdfs namenode -format

Starting Hadoop (only in master)



Check Available Nodes on master.hadoop.dts:8088 or


Install Hadoop to Ubuntu Server 18.04


  • Microsoft Windows 10
  • Virtualbox 6.0


  • Ubuntu Server 18.04 already installed.

Virtual Box Network Confirguration

  • Create a new Host Network adapter from Virtual Box vbox_host_manager_1

  • Fill the IP values to follow below image vbox_host_manager_2

  • Update VM network configuration by clicking Gear icon and select Tab Network.

    Make sure the first Adapter set to NAT


    And make sure the second adapter to use Host Only Network


  • Check network configuration

    • Start our VM then login

    • Execute this command if config, make sure it listed 3 network adapters. The network name maybe different. One of the network should have IP 192.168.85.*


      This configuration will allow the Windows Host to access VM and VM can access the internet from Windows Host.

    • To login to VM from windows host, we can use command prompt and type ssh hadoop@ vbox_host_manager_6



  1. Prerequisite

    Before begining the installation run login shell as the sudo user and update the current packages installed

      sudo apt update
      sudo apt upgrade
  2. Install Java 11 on Ubuntu 18.04

    You need to add the following PPA to your Ubuntu system. This PPA contains a package oracle-java11-installer having the Java installation script.

        sudo add-apt-repository ppa:linuxuprising/java

    Then install Java Runtime Environment (JRE)

      sudo apt install default-jre

    Install Java Development Kit (JDK)

      sudo apt install default-jdk

    Both this command will automatically add java executable into environment variable. To be save, reload the environment variables using source ~/.bashrc. To check JRE is working, please run java or java --version command on terminal.


    To check JDK is working, please run javac command on terminal.



  1. Create user for Hadoop

    We recommend creating a normal (nor root) account for Hadoop working. To create an account using the following command.

      adduser hadoop

    After creating the account, it also required to set up key-based ssh to its own account. To do this use execute following commands.

      su - hadoop
      ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
      cat ~/.ssh/ >> ~/.ssh/authorized_keys
      chmod 0600 ~/.ssh/authorized_keys

    Now, SSH to localhost with Hadoop user. This should not ask for the password but the first time it will prompt for adding RSA to the list of known hosts.

    ssh localhost
  2. Download Hadoop Source Archive

    In this step, download hadoop 3.2.0 source archive file using below command.

        cd ~
        tar xzf hadoop-3.2.0.tar.gz
        mv hadoop-3.2.0 hadoop
  3. Setup Hadoop Pseudo-Distributed Mode

    Setup the environment variables used by the Hadoop. Edit ~/.bashrc file and append following values at end of file.

        export JAVA_HOME=/usr/lib/jvm/default-java
        export HADOOP_HOME=/home/hadoop/hadoop
        export YARN_HOME=$HADOOP_HOME
        export HADOOP_CONFIG_DIR=$HADOOP_HOME/share/hadoop/common/lib
        export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

    Then, apply the changes in the current running environment

        source ~/.bashrc

    Now edit $HADOOP_HOME/etc/hadoop/ file and set JAVA_HOME environment variable. Change the JAVA path as per install on your system. This path may vary as per your operating system version and installation source. So make sure you are using the correct path.

        nano $HADOOP_HOME/etc/hadoop/

    Then add this line

        export JAVA_HOME=/usr/lib/jvm/default-java
        export HADOOP_CLASSPATH+=" $HADOOP_CONF_DIR/lib/*.jar"
  4. Setup Hadoop Configuration Files

    Hadoop has many configuration files, which need to configure as per requirements of your Hadoop infrastructure. Let’s start with the configuration with basic Hadoop single node cluster setup. first, navigate to below location

        cd $HADOOP_HOME/etc/hadoop

    Edit core-site.xml -> nano core-site.xml


    Edit hdfs-site.xml -> nano hdfs-site.xml


    Edit mapred-site.xml -> nano mapred-site.xml


    Edit yarn-site.xml -> nano yarn-site.xml

  5. Format Namenode

    Now format the namenode using the following command, make sure that Storage directory is

        hdfs namenode -format

    Sample output:

         WARNING: /home/hadoop/hadoop/logs does not exist. Creating.
         2018-05-02 17:52:09,678 INFO namenode.NameNode: STARTUP_MSG:
         STARTUP_MSG: Starting NameNode
         STARTUP_MSG:   host = tecadmin/
         STARTUP_MSG:   args = [-format]
         STARTUP_MSG:   version = 3.1.2
         2018-05-02 17:52:13,717 INFO common.Storage: Storage directory /home/hadoop/hadoopdata/hdfs/namenode has been successfully formatted.
         2018-05-02 17:52:13,806 INFO namenode.FSImageFormatProtobuf: Saving image file /home/hadoop/hadoopdata/hdfs/namenode/current/fsimage.ckpt_0000000000000000000 using no compression
         2018-05-02 17:52:14,161 INFO namenode.FSImageFormatProtobuf: Image file /home/hadoop/hadoopdata/hdfs/namenode/current/fsimage.ckpt_0000000000000000000 of size 391 bytes saved in 0 seconds .
         2018-05-02 17:52:14,224 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
         2018-05-02 17:52:14,282 INFO namenode.NameNode: SHUTDOWN_MSG:
         SHUTDOWN_MSG: Shutting down NameNode at tecadmin/
  6. Start Hadoop Cluster

    Let’s start your Hadoop cluster using the scripts provides by Hadoop. Just navigate to your $HADOOP_HOME/sbin directory and execute scripts one by one.

        cd $HADOOP_HOME/sbin/

    Now execute script.



    Then execute script.



    To make it easier to start hadoop from any directory, you can create an alias by modifying ~/.bashrc and add these lines

        alias hstart=$HADOOP_HOME/sbin/
        alias hstop=$HADOOP_HOME/sbin/

    Then run source ~/.bashrc, now we can start or stop hadoop services from any directory with hstart or hstop command.

Virtual Box

This steps only valid if you do not want to setup multiple VM on windows because the port forwarding will conflict if you duplicate the VM.

Implement port forwarding so the windows can communicate with vbox.

  • Open Machine setting, not Virtualbox setting

  • Click tab Network

  • On tab Adapter 1, make sure it's NAT then click Advanced

  • Click button Port Forwarding

  • Add new rules, leave all default value, also leave Host IP and Guest IP empty. Only edit Host Port and Guest Port. Host port is the windows and guest port for the ubuntu server.

    Host Port Guest Port
    2202 22
    9870 9870
    9864 9864
    8088 8088

    For more info about available ports on Hadoop 3.2.0 can be found on this [link](all available ports for forwarding

  • Click OK button

Access Hadoop from Windows

  1. You can login to ubuntu server through windows command prompt using this command

       ssh hadoop@
  2. To access the Hadoop Web GUI, you can open this link through browser on your windows machine

Frequently Asked Question (FAQ)

  • Java command not found

    Please run this command echo $JAVA_HOME to see if the java added to environment variable, if it return empty string then you have to add it on ~/.bashrc using nano or other editor.

    Add this line export JAVA_HOME="/usr/lib/jvm/default-java" -> you need to modify the path based on your java installation inside /usr/lib/jvm.

    After that, you need to reload the updated setting using this command source ~/.bashrc. Now the java command should work.

    If the problem still persist, please follow my Java Installation above

  • Hadoop or hdfs command not found

    Please make sure you have these values inside your .bashrc file


    Please adjust $HADOOP_HOME value to your hadoop instalation path.

    Then reload the env file using this command source ~/.bashrc, now the hdfs or hadoop command should work

  • Failed to retrive data from /webhdfs/v1/?op=LISTSTATUS; Server Error when trying to browse HDFS directory on the browser

    Run this command on your terminal

        cd $HADOOP_HOME/share/hadoop/common/lib

    Then make sure on your ~/.bashrc contain this line export HADOOP_CONFIG_DIR=$HADOOP_HOME/share/hadoop/common/lib

    Make sure $HADOOP_HOME/etc/hadoop/ contain this line export HADOOP_CLASSPATH+=" $HADOOP_CONF_DIR/lib/*.jar"

    Then restart the hadoop services

  • class not found.

    1. Find your java-jdk location, usually on /usr/lib/jvm/default-java, if not installed you can run sudo apt install default-jdk

    2. Add this line to your ~/.bashrc

          export HADOOP_CLASSPATH=$JAVA_HOME/lib/tools.jar
    3. Reload configuration source ~/.bashrc

  • After cloning existing VM, the IP not changed.

    We need to renew the interface to get a new IP so that each VM can have unique IP Address.

    1. Check interface name using ifconfig -a


      Our Interface name is enp0s8 with IP

      Now we can run this command to reset our interface

          sudo dhclient -v -r enp0s8
          ifconfig -a


      To get a new IP, please run this command

          sudo dhclient enp0s8
          ifconfig -a


      Now we have a different IP address for each VM and can communicate between them


  • How to set domain name for each VM instead of remembering the IP address on windows

    1. Open notepad as administrator

    2. Open file c://Windows/System32/drivers/etc/hosts -> make sure to show all files


    3. Add these lines



    4. Now we can access the domain name on the windows browser



    5. You can ssh using that domain name too


Copy link

isak06 commented May 4, 2020

Itake these error message

/opt/hadoop/hadoop/libexec/ line 1801: /tmp/ Erişim engellendi
ERROR: Cannot write namenode pid /tmp/
Error: Could not find or load main class

Copy link

I cannot see any parameter in hdfs-site.xml as """ in the following location.
Are you sure the names are correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment