Last active
August 17, 2018 12:02
-
-
Save eellpp/fcdcb03ca02fbd495b67ce7e488422f5 to your computer and use it in GitHub Desktop.
Running Hadoop in pseudo-distributed mode on mac
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
> brew install hadoop | |
This installs hadoop at /usr/local/Cellar/hadoop/2.7.3 | |
Find java home | |
> cd /usr/local/Cellar/hadoop/2.7.3 | |
> vim etc/hadoop/hadoop-env.sh | |
The JAVA_HOME should be set as below in file | |
export JAVA_HOME="$(/usr/libexec/java_home)" | |
SSH | |
Mac: Enable Remote Login in System Preference -> Sharing. | |
ssh and check that you can ssh to the localhost without a passphrase: | |
$ ssh localhost | |
If you cannot ssh to localhost without a passphrase, execute the following commands: | |
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa | |
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys | |
Configuration | |
Edit following config files in your Hadoop directory | |
1) etc/hadoop/core-site.xml: | |
<configuration> | |
<property> | |
<name>fs.defaultFS</name> | |
<value>hdfs://localhost:9000</value> | |
</property> | |
</configuration> | |
2) etc/hadoop/hdfs-site.xml: | |
<configuration> | |
<property> | |
<name>dfs.replication</name> | |
<value>1</value> | |
</property> | |
</configuration> | |
3) etc/hadoop/mapred-site.xml: | |
<configuration> | |
<property> | |
<name>mapreduce.framework.name</name> | |
<value>yarn</value> | |
</property> | |
</configuration> | |
4) etc/hadoop/yarn-site.xml: | |
<configuration> | |
<property> | |
<name>yarn.nodemanager.aux-services</name> | |
<value>mapreduce_shuffle</value> | |
</property> | |
</configuration> | |
Execution | |
Format and start HDFS and YARN | |
> hdfs namenode -format | |
> start-dfs.sh | |
Now you can browse the web interface for the NameNode at - http://localhost:50070/ | |
Make the HDFS directories required to execute MapReduce jobs: | |
> hdfs dfs -mkdir /user | |
> hdfs dfs -mkdir /user/{username} #make sure you add correct username here | |
Start ResourceManager daemon and NodeManager daemon: | |
> start-yarn.sh | |
Browse the web interface for the ResourceManager at - http://localhost:8088/ | |
Test examples code that came with the hadoop version | |
> bin/hdfs dfs -mkdir /input | |
> bin/hdfs dfs -put libexec/etc/hadoop /input | |
> hadoop jar libexec/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep /input/* /output 'dfs[a-z.]+' | |
Examine the output files: | |
Copy the output files from the distributed filesystem to the local filesystem and examine them: | |
> hdfs dfs -get /output ./output | |
> vim ./output/part-r-00000 | |
submit a yarn job | |
> yarn jar libexec/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar pi 6 1000 | |
When you're done, stop the daemons with: | |
$ stop-yarn.sh | |
$ stop-dfs.sh | |
Reference: http://zhongyaonan.com/hadoop-tutorial/setting-up-hadoop-2-6-on-mac-osx-yosemite.html |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
2 changes are required for hadoop 2.7.1 on mac: