Skip to content

Instantly share code, notes, and snippets.

@GEOFBOT
Last active August 7, 2016 16:17
Show Gist options
  • Save GEOFBOT/8f598525370c9d0e9a4767f640bb512b to your computer and use it in GitHub Desktop.
Save GEOFBOT/8f598525370c9d0e9a4767f640bb512b to your computer and use it in GitHub Desktop.
Setting up HDFS on AWS

On each node:

Set up packages and install Hadoop:

#!/bin/bash

sudo yum install java-1.8.0-openjdk-devel wget git bzip2 -y
echo export JAVA_HOME=/usr/lib/jvm/java >> ~/.bashrc
source ~/.bashrc

cd ~
wget http://www.gtlib.gatech.edu/pub/apache/hadoop/core/hadoop-2.7.2/hadoop-2.7.2.tar.gz
tar zxf hadoop-2.7.2.tar.gz
rm hadoop-2.7.2.tar.gz

On master node:

Set up hadoop-2.7.2/etc/hadoop/ conf files:

core-site.xml:

<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://172.31.14.106:9000/</value>
  </property>
</configuration>

hdfs-site.xml:

<configuration>
  <property>
    <name>dfs.replication</name>
    <!-- http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/HDFSConfig_H2.html -->
    <value>1</value>  <!-- per AWS recommendations (sort of) -->
  </property>
</configuration>

slaves:

172.31.46.51
172.31.46.52
172.31.46.53
172.31.46.54

Set up SSH keys on master node (~/.ssh/config):

Host *
    IdentityFile ~/.ssh/key.pem

Copy config to worker machines:

while read slaves; do scp * ${slaves}:~/hadoop-2.7.2/etc/hadoop/; done < slaves

Format and start HDFS:

bin/hdfs namenode -format
sbin/start-dfs.sh
172.31.10.155
172.31.43.154
172.31.43.155
172.31.43.156
172.31.43.157
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment