Skip to content

Instantly share code, notes, and snippets.

@eellpp
Last active August 17, 2018 12:02
Show Gist options
  • Save eellpp/fcdcb03ca02fbd495b67ce7e488422f5 to your computer and use it in GitHub Desktop.
Save eellpp/fcdcb03ca02fbd495b67ce7e488422f5 to your computer and use it in GitHub Desktop.
Running Hadoop in pseudo-distributed mode on mac
> brew install hadoop
This installs hadoop at /usr/local/Cellar/hadoop/2.7.3
Find java home
> cd /usr/local/Cellar/hadoop/2.7.3
> vim etc/hadoop/hadoop-env.sh
The JAVA_HOME should be set as below in file
export JAVA_HOME="$(/usr/libexec/java_home)"
SSH
Mac: Enable Remote Login in System Preference -> Sharing.
ssh and check that you can ssh to the localhost without a passphrase:
$ ssh localhost
If you cannot ssh to localhost without a passphrase, execute the following commands:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Configuration
Edit following config files in your Hadoop directory
1) etc/hadoop/core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
2) etc/hadoop/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
3) etc/hadoop/mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
4) etc/hadoop/yarn-site.xml:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
Execution
Format and start HDFS and YARN
> hdfs namenode -format
> start-dfs.sh
Now you can browse the web interface for the NameNode at - http://localhost:50070/
Make the HDFS directories required to execute MapReduce jobs:
> hdfs dfs -mkdir /user
> hdfs dfs -mkdir /user/{username} #make sure you add correct username here
Start ResourceManager daemon and NodeManager daemon:
> start-yarn.sh
Browse the web interface for the ResourceManager at - http://localhost:8088/
Test examples code that came with the hadoop version
> bin/hdfs dfs -mkdir /input
> bin/hdfs dfs -put libexec/etc/hadoop /input
> hadoop jar libexec/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep /input/* /output 'dfs[a-z.]+'
Examine the output files:
Copy the output files from the distributed filesystem to the local filesystem and examine them:
> hdfs dfs -get /output ./output
> vim ./output/part-r-00000
submit a yarn job
> yarn jar libexec/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar pi 6 1000
When you're done, stop the daemons with:
$ stop-yarn.sh
$ stop-dfs.sh
Reference: http://zhongyaonan.com/hadoop-tutorial/setting-up-hadoop-2-6-on-mac-osx-yosemite.html
@rachitchauhan43
Copy link

2 changes are required for hadoop 2.7.1 on mac:

  1. generate rsa keys rather than dsa
  2. Edit hadoop-env.sh remove export JAVA_HOME statement and add
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment