Skip to content

Instantly share code, notes, and snippets.

@iambudi
Forked from viecode09/Hadoop_install_osx.md
Last active April 7, 2024 12:23
Show Gist options
  • Save iambudi/7b10aac79c3049a574f566a664800a73 to your computer and use it in GitHub Desktop.
Save iambudi/7b10aac79c3049a574f566a664800a73 to your computer and use it in GitHub Desktop.
This is how to install hadoop on Mac OS

STEP 1: First Install HomeBrew, download it from http://brew.sh

$ ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

STEP 2: Install Hadoop

$ brew search hadoop
$ brew install hadoop

Hadoop will be installed at path /usr/local/Cellar/hadoop

STEP 3: Configure Hadoop:

Edit hadoop-env.sh, the file can be located at /usr/local/Cellar/hadoop/3.1.1/libexec/etc/hadoop/hadoop-env.sh where 3.1.1 is the hadoop version. Change the line.

Edit Core-site.xml, The file can be located at /usr/local/Cellar/hadoop/3.1.1/libexec/etc/hadoop/core-site.xml add between <configuration></configuration>

<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/Cellar/hadoop/hdfs/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>

Edit mapred-site.xml, The file can be located at /usr/local/Cellar/hadoop/3.1.1/libexec/etc/hadoop/mapred-site.xml and by default will be blank add below config

<configuration>
 <property>
  <name>mapred.job.tracker</name>
  <value>localhost:9010</value>
 </property>
</configuration>

Edit hdfs-site.xml, The file can be located at /usr/local/Cellar/hadoop/3.1.1/libexec/etc/hadoop/hdfs-site.xml add

<configuration>
 <property>
  <name>dfs.replication</name>
  <value></value>
 </property>
</configuration>

To simplify life edit a ~/.profile or ~/.zshrc or any shell profile used and add the following commands. By default the profile file might not exist.

alias hstart="/usr/local/Cellar/hadoop/3.1.1/sbin/start-dfs.sh;/usr/local/Cellar/hadoop/3.1.1/sbin/start-yarn.sh"
alias hstop="/usr/local/Cellar/hadoop/3.1.1/sbin/stop-yarn.sh;/usr/local/Cellar/hadoop/3.1.1/sbin/stop-dfs.sh"

and source it

$ source ~/.profile or $ source ~/.zshrc

Before running Hadoop format HDFS

$ hdfs namenode -format

STEP 4: To verify if SSH Localhost is working check for files ~/.ssh/id_rsa and the ~/.ssh/id_rsa.pub files. If they don’t exist generate the keys using below command

$ ssh-keygen -t rsa :: Just press enter if asking for password

Enable Remote Login: open Mac “System Preferences” -> “Sharing”. Check “Remote Login”

Authorize SSH Keys: To allow your system to accept login, we have to make it aware of the keys that will be used

$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Test login. It should not prompt for any password. Otherwise while run hadoop it will display error Permission denied (publickey,password,keyboard-interactive)

$ ssh localhost
Last login: Fri Mar 6 20:30:53 2015
$ exit

STEP 5: Run Hadoop

$ hstart

and stop using

$ hstop

@pozilei
Copy link

pozilei commented Jan 21, 2021

ssh localhost asks for password, therefore I am getting Permission denied err. Any help is much appreciated!

@dungtinhhd123
Copy link

ssh localhost asks for password, therefore I am getting Permission denied err. Any help is much appreciated!

I got the same problem. Then I added:
export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc="
export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS"
export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_SECONDARYNAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTS"
export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_BALANCER_OPTS"
export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS"
to the file “hadoop-env.sh” and problem solved.

@dungtinhhd123
Copy link

ssh localhost asks for password, therefore I am getting Permission denied err. Any help is much appreciated!

I got the same problem. Then I added:
export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc="
export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS"
export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_SECONDARYNAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTS"
export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_BALANCER_OPTS"
export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS"
to the file “hadoop-env.sh” and problem solved.

EDIT
The above answer was wrong.
You must run “ssh-add “. If it informs “could not open a connection to your authentication agent” then you should start the ssh-agent first by $eval “$(ssh-agent -s)” and ssh-add ~/.ssh/id_rsa.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment