namenode 1台
datanode 3台
-
ホストオンリーネットワーク
- 192.168.10.x
-
内部ネットワーク
- 192.168.20.x
-
IP: Domain
- 10: master
- 11: slave1
- 12: slave2
- 13: slave3
パスフレーズ無し ssh-keygen -t rsa で作成したものKeyPairを各スレーブにコピーして相互にパス無し接続出来るようにする
- 参考
- http://www.kde.cs.tut.ac.jp/~atsushi/?p=64
- http://blog.ik.am/entry/view/id/40/title/Hadoop%E5%AE%8C%E5%85%A8%E5%88%86%E6%95%A3%E7%92%B0%E5%A2%83%E6%A7%8B%E7%AF%89%E3%83%A1%E3%83%A2/
- http://metasearch.sourceforge.jp/wiki/index.php?Hadoop%A5%BB%A5%C3%A5%C8%A5%A2%A5%C3%A5%D7#y1b3a4e5
- http://d.hatena.ne.jp/kj-ki/20101211/p1
$ vim /etc/apt/sources.list
deb http://ftp.riken.jp/Linux/debian/debian/ lenny main non-free
deb-src http://ftp.riken.jp/Linux/debian/debian/ lenny main non-free
$ aptitude update
$ aptitude install sun-java6-jdk
$ update-alternatives --set java /usr/lib/jvm/java-6-sun/jre/bin/java
$ java -version
Java version "1.6.0_22"
Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
Java HotSpot(TM) Client VM (build 17.1-b03, mixed mode, sharing)
$ vim /etc/apt/sources.list.d/cloudera.list
deb http://archive.cloudera.com/debian lenny-cdh3 contrib
deb-src http://archive.cloudera.com/debian lenny-cdh3 contrib
$ aptitude update
$ aptitude install curl rsync sudo
$ curl -s http://archive.cloudera.com/debian/archive.key | sudo apt-key add -
$ aptitude install hadoop hadoop-0.20-namenode hadoop-0.20-secondarynamenode hadoop-0.20-jobtracker
$ aptitude install hadoop hadoop-0.20-datanode hadoop-0.20-tasktracker
$ cp -r /etc/hadoop-0.20/conf.empty /etc/hadoop-0.20/conf.cluster
$ update-alternatives --install /etc/hadoop-0.20/conf hadoop-0.20-conf /etc/hadoop-0.20/conf.cluster 50
$ update-alternatives --display hadoop-0.20-conf
hadoop-0.20-conf -状態は auto。
リンクは現在 /etc/hadoop-0.20/conf.cluster を指しています
/etc/hadoop-0.20/conf.empty - 優先度 10
/etc/hadoop-0.20/conf.cluster - 優先度 50
現在の `最適' バージョンは /etc/hadoop-0.20/conf.cluster です。
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/var/lib/hadoop-0.20/cache/${user.name}</value>
</property>
</configuration>
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:54311</value>
</property>
<property>
<name>mapred.hosts</name>
<value>${hadoop.tmp.dir}/hosts.include</value>
</property>
<property>
<name>mapred.hosts.exclude</name>
<value>${hadoop.tmp.dir}/hosts.exclude</value>
</property>
</configuration>
master
slave1
slave2
slave3
192.168.20.10 master master.localdomain
192.168.20.11 slave1 slave1.localdomain
192.168.29.12 slave2 slave2.localdomain
192.168.20.13 slave3 slave3.localdomain
$ sudo -u hdfs shadoop namenode -format
$ /etc/init.d/hadoop-0.20-namenode start
$ /etc/init.d/hadoop-0.20-secondnamenode start
$ /etc/init.d/hadoop-0.20-jobtracker start
$ /etc/init.d/hadoop-0.20-datanode start
$ /etc/init.d/hadoop-0.20-tasktracker start
/etc/hadoop/conf.cluster 以下のスクリプトはマスタースレーブで共通で大丈夫なのでマスターからrsyncで簡単にコピー出来る。rsyncで各サーバにスクリプトを組むと良い
#!/bin/sh
NAMENODE="192.168.20.10"
DATANODE="192.168.20.11 192.168.20.12 192.168.20.13"
NODE="$NAMENODE $DATANODE"
#!/bin/sh
. ~/node.sh
for n in $NODE;
do
CMD="ssh $n $*"
echo "== $n =="
$CMD;
done
#!/bin/sh
. ~/node.sh
for n in $NODE;
do
if [ `hostname` != "$n" ];then
CMD="sudo rsync --progress -av /etc/hadoop/conf.cluster/ $n:/etc/hadoop/conf.cluster"
echo "== $n =="
echo $CMD
$CMD
fi
done