Skip to content

Instantly share code, notes, and snippets.

@akiatoji
Created June 25, 2015 15:32
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save akiatoji/5770201c13af5337e4bc to your computer and use it in GitHub Desktop.
Save akiatoji/5770201c13af5337e4bc to your computer and use it in GitHub Desktop.
Hadoop pseudo cluster on OS X

Hadoop Config on OS X

Config to apply once Hadoop (2.7.0 as of now) is installed on OS X

Here's some thought on config:

  • Nodename = localhost means Hadoop will be accessed from the same host. If accessing from another host, change localhost to actual host name. If namenode name in core-site.xml doesn't match what client uses to connect, you'll get the dreaded connection refused error.

  • The system has plenty (16G) of RAM. It's better to overcommit maximum allocation, so you might as well set maximum allocaiton even higher to like 24G. Otherwise your Yarn jobs can get stuck in wait states. Yarn's memory calculation doesn't reflect actual memory usage.

  • To run many small long running Yarn jobs (i.e. Samza tasks), specify minimum allocation and make it low. Samza tasks are typically pretty small, but every Samza job ends up starting an AM. If you don't specify lower minimum allocation when running many Samza tasks, Yarn will overallocate memory and stop running more jobs when there's plenty of memory still.

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/Cellar/hadoop/hdfs/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
<?xml version="1.0"?>
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>512</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>16384</value>
</property>
</configuration>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment