- 假设当前有三台机器,分别是
- 192.168.33.100 hadoop-master
- 192.168.33.200 hadoop-slave01
- 192.168.33.300 hadoop-slave02
3台机器均要做以下操作
- 配置host
vim /etc/hosts
//复制以下3行内容
192.168.33.100 hadoop-master
192.168.33.200 hadoop-slave01
192.168.33.300 hadoop-slave02
//保存后退出
- 添加Hadoop用户
useradd hadoop
//给hadoop用户添加密码:hadoop
passwd hadoop
- 在hadoop-master上无密码登录hadoop-slave01和hadoop-slave02
//当前是root账户,以下操作需要输入机器密码
ssh-keygen -t rsa
ssh-copy-id root@hadoop-slave01
ssh-copy-id root@hadoop-slave02
//当前是hadoop账户,以下操作需要输入机器密码
su hadoop
ssh-keygen -t rsa
ssh-copy-id hadoop@hadoop-slave01
ssh-copy-id hadoop@hadoop-slave02
wget https://archive.apache.org/dist/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
tar zxvf hadoop-2.6.0.tar.gz
mv hadoop-2.6.0 /usr/hadoop
//在hadoop-master机器上下载后复制到其他两台机器
scp -r /usr/hadoop root@192.168.33.200:/usr
scp -r /usr/hadoop root@192.168.33.300:/usr
- 安装java环境
yum install openjdk
vim /etc/profile
//复制下面6行内容
export JAVA_HOME=/usr/lib/jvm/java-openjdk
export JRE_HOME=/usr/lib/jvm/java-openjdk/jre
export CLASSPATH=.:${CLASSPATH}:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${PATH}:${JAVA_HOME}/bin:${JRE_HOME}/bin
export HADOOP_HOME=/usr/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
//保存后退出profile文件,让配置生效
source /etc/profile
在hadoop-master上操作下面步骤
- 新建tmp目录
mkdir /usr/hadoop/tmp
- 配置hadoop-env.sh
vim /usr/hadoop/etc/hadoop/hadoop-env.sh
//找到export JAVA_HOME=${JAVA_HOME},修改为
export JAVA_HOME=/usr/lib/jvm/java-openjdk
- 配置core-site.xml
按照下面的内容配置 Master (NameNode/ResourceManager) 的地址和端口号,进行下面的配置前一定要创建 /usr/hadoop/tmp 目录。fs.defaultFS 用来指定 ResourceManager 设备
vim /usr/hadoop/etc/hadoop/core-site.xml
//将下面内容复制进去后保存
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop-master</value>
</property>
<!-- file system properties -->
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop-master:9000</value>
</property>
</configuration>
上面的 hadoop.tmp.dir 如果不配置的话默认就会用临时目录 /tmp/hadoo-hadoop 。这个目录每次重启后都会被删掉,必须重新执行format才行,否则会出错。
- 配置hdfs-site.xml
下面配置中的 dfs.replication 参数用来指定每个文件保存多少份,对于普通使用来说 1 份就可以,但这种没有备份的情况可能一个 DataNode 损坏就丢失文件了,如果文件比较重要最好还是多备份一份,这里设置为 2 是因为我们有 2 台 DataNode 正好每个备份一个,如果配置的 DataNode 数量不够会报警。如果不配置这个参数那么默认是 3
vim /usr/hadoop/etc/hadoop/hdfs-site.xml
//将下面内容复制进去然后保存
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
- 主机配置
vim /usr/hadoop/etc/hadoop/masters
//将下面1行复制然后保存
hadoop-master
vim /usr/hadoop/etc/hadoop/slaves
//将下面2行复制然后保存
hadoop-slave01
hadoop-slave02
- 复制上面配置到其他两台机器
scp -r /usr/hadoop/tmp root@hadoop-slave01:/usr/hadoop
scp -r /usr/hadoop/etc/hadoop/hadoop-env.sh root@hadoop-slave01:/usr/hadoop/etc/hadoop
scp -r /usr/hadoop/etc/hadoop/core-site.xml root@hadoop-slave01:/usr/hadoop/etc/hadoop
scp -r /usr/hadoop/etc/hadoop/hdfs-site.xml root@hadoop-slave01:/usr/hadoop/etc/hadoop
scp -r /usr/hadoop/etc/hadoop/masters root@hadoop-slave01:/usr/hadoop/etc/hadoop
scp -r /usr/hadoop/etc/hadoop/slaves root@hadoop-slave01:/usr/hadoop/etc/hadoop
scp -r /usr/hadoop/tmp root@hadoop-slave02:/usr/hadoop
scp -r /usr/hadoop/etc/hadoop/hadoop-env.sh root@hadoop-slave02:/usr/hadoop/etc/hadoop
scp -r /usr/hadoop/etc/hadoop/core-site.xml root@hadoop-slave02:/usr/hadoop/etc/hadoop
scp -r /usr/hadoop/etc/hadoop/hdfs-site.xml root@hadoop-slave02:/usr/hadoop/etc/hadoop
scp -r /usr/hadoop/etc/hadoop/masters root@hadoop-slave02:/usr/hadoop/etc/hadoop
scp -r /usr/hadoop/etc/hadoop/slaves root@hadoop-slave02:/usr/hadoop/etc/hadoop
- 修改权限,三台机器都需要
chown -R hadoop:hadoop /usr/hadoop
chown -R 755 /usr/hadoop
以下操作均需要hadoop账户,在hadoop-master机器,目录:/usr/hadoop
- 格式化 HDFS 系统
hadoop namenode -format
- 启动 hadoop
./sbin/start-dfs.sh
./sbin/start-yarn.sh
- 验证是否启动
使用 java 的 jps 小工具可以看到 ResourceManager , NameNode 都启动
# On Master
[hadoop@Master hadoop]$ jps
29011 ResourceManager
21836 NameNode
2159 Jps
32261 SecondaryNameNode
# On Slave
[root@Salve_1 hadoop]# jps
4799 DataNode
20182 Jps
11010 NodeManager
- 用 ./bin/hdfs dfsadmin -report 查看状态
# On Master
[hadoop@Master hadoop]$ ./bin/hdfs dfsadmin -report
Configured Capacity: 107321753600 (99.95 GB)
Present Capacity: 95001157632 (88.48 GB)
DFS Remaining: 95001149440 (88.48 GB)
DFS Used: 8192 (8 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
- 查看hadoop使用的端口
netstat -tnulp | grep java
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 127.0.0.1:9000 0.0.0.0:* LISTEN 21836/java
tcp 0 0 0.0.0.0:50090 0.0.0.0:* LISTEN 32261/java
tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 21836/java
tcp6 0 0 :::8030 :::* LISTEN 29011/java
tcp6 0 0 :::8031 :::* LISTEN 29011/java
tcp6 0 0 :::8032 :::* LISTEN 29011/java
tcp6 0 0 :::8033 :::* LISTEN 29011/java
tcp6 0 0 :::8088 :::* LISTEN 29011/java
- 在浏览器上访问
http://hadoop-master:50070