开发环境:Ubuntu 16.04 server
Hadoop 版本:Apache Hadoop 3.0.0-alpha4
openjdk 版本:1.8.0_131
-
安装 ssh:
sudo apt install ssh
-
下载 Hadoop 软件,并解压:
wget http://mirror.metrocast.net/apache/hadoop/common/hadoop-3.0.0-alpha4/hadoop-3.0.0-alpha4.tar.gz tar xvf hadoop-3.0.0-alpha4/hadoop-3.0.0-alpha4.tar.gz
-
修改 Hadoop 中 Java 的路径(
etc/hadoop/hadoop-env.sh
):export JAVA_HOME=export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre
-
准备输入文件:
mkdir input cp etc/hadoop/*.xml input
-
使用 Hadoop 自带的数单词个数的程序测试结果:
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-alpha4.jar wordcount input output
-
查看输出结果:
cat output/*
实验结果:
"*" 18 "AS 9 "License"); 9 "alice,bob 18 "clumping" 1 (ASF) 1 (root 1 (the 9 --> 18 -1 1 0.0 1 1-MAX_INT. 1 1. 1 1.0. 1 2.0 9 40 2 40+20=60 1 <!-- 18 </configuration> 9 </description> 29 </property> 50 <?xml 8 <?xml-stylesheet 4 <configuration> 9 <description> 28 <description>ACL 21 ...
在单节点模式的基础上进行下面的操作:
-
配置
etc/hadoop/core-site.xml
:<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration>
-
配置
etc/hadoop/hdfs-site.xml
:<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
-
配置 ssh 免密码登录:
ssh-keygen cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
-
测试 ssh 免密码登录:
ssh localhost
若没有提示输入密码,则配置成功。
-
格式化 namenode:
bin/hdfs namenode -format
-
启动 namenode 和 datanode:
sbin/start-dfs.sh
成功后可以通过
http://localhost:9870/
访问 namenode 的 web 界面。 -
创建 hdfs 中的目录结构:
bin/hdfs dfs -mkdir /user bin/hdfs dfs -mkdir /user/f
-
将输入文件到分布式文件系统中去:
bin/hdfs dfs -mkdir input bin/hdfs dfs -put etc/hadoop/*.xml input
-
使用 Hadoop 自带的数单词个数的程序测试结果:
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-alpha4.jar wordcount input output
-
查看输出结果:
bin/hdfs dfs -cat output/*
实验结果:
"*" 18 "AS 9 "License"); 9 "alice,bob 18 "clumping" 1 (ASF) 1 (root 1 (the 9 --> 18 -1 1 0.0 1 1-MAX_INT. 1 1. 1 1.0. 1 2.0 9 40 2 40+20=60 1 <!-- 18 </configuration> 9 </description> 29 </property> 50 <?xml 8 <?xml-stylesheet 4 <configuration> 9 <description> 28 <description>ACL 21 ...
-
关闭分布式文件系统:
sbin/stop-dfs.sh
这个实验是在几台不同的主机上进行的,为此我们创建了三台虚拟机(master, slave, slave2)。
-
安装好两台 slave 节点的系统。
-
在 master 节点进行 3 - 8 步的配置。
-
获取几台主机的 ip 地址,修改
/etc/hosts
文件,方便识别各个主机:127.0.0.1 localhost 127.0.1.1 ubuntu 192.168.124.129 slave 192.168.124.130 slave2 192.168.124.128 master # The following lines are desirable for IPv6 capable hosts ::1 localhost ip6-localhost ip6-loopback ff02::1 ip6-allnodes ff02::2 ip6-allrouters
-
修改
core-site.xml
:<property> <name>fs.default.name</name> <value>hdfs://master:9000</value> </property>
-
修改
hdfs-site.xml
:<property> <name>dfs.replication</name> <value>2</value> </property>
-
修改
yarn-site.xml
:<property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8025</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>master:8035</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>master:8050</value> </property>
-
修改
mapred-site.xml
:<property> <name>mapreduce.job.tracker</name> <value>master:5431</value> </property> <property> <name>mapred.framework.name</name> <value>yarn</value> </property>
-
修改
workers
:master slave slave2
-
同之前两种模式一样,配置 master, slave, slave2 的 ssh ,使它们能够两两之间免密码登录。
-
使用 scp 将 master 节点的 Hadoop 文件夹拷贝到 slave, slave2 的对应位置:
scp -r /usr/local/hadoop f@slave:/usr/local scp -r /usr/local/hadoop f@slave2:/usr/local
-
停止 dfs,并重新格式化 namenode:
/usr/local/hadoop/bin/hdfs namenode -format
-
启动 hdfs:
/usr/local/hadoop/sbin/start-dfs.sh
-
启动 yarn:
/usr/local/hadoop/sbin/start-yarn.sh
-
在 master 节点运行
jps
,得到输出结果:f@ubuntu:~$ jps 2357 ResourceManager 7511 Jps 2667 NodeManager 1772 NameNode 2110 SecondaryNameNode 1903 DataNode
-
在 slave 节点运行
jps
,得到输出结果:f@ubuntu:~$ jps 2244 Jps 1288 NodeManager 1164 DataNode
-
在 slave2 节点运行
jps
,得到输出结果:f@ubuntu:~$ jps 1291 NodeManager 1167 DataNode 4015 Jps
-
在 master 节点运行
/usr/local/hadoop/bin/hdfs dfsadmin -report
,得到输出结果:f@ubuntu:~$ /usr/local/hadoop/bin/hdfs dfsadmin -report Configured Capacity: 76822855680 (71.55 GB) Present Capacity: 45398016000 (42.28 GB) DFS Remaining: 45397708800 (42.28 GB) DFS Used: 307200 (300 KB) DFS Used%: 0.00% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks (with replication factor 1): 0 Pending deletion blocks: 0 ------------------------------------------------- Live datanodes (3): Name: 192.168.124.128:9866 (master) Hostname: ubuntu Decommission Status : Normal Configured Capacity: 39043194880 (36.36 GB) DFS Used: 106794 (104.29 KB) Non DFS Used: 20659420886 (19.24 GB) DFS Remaining: 16376745984 (15.25 GB) DFS Used%: 0.00% DFS Remaining%: 41.95% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Sun Sep 24 06:23:22 PDT 2017 Last Block Report: Sun Sep 24 04:41:17 PDT 2017 Name: 192.168.124.129:9866 (slave) Hostname: ubuntu Decommission Status : Normal Configured Capacity: 18889830400 (17.59 GB) DFS Used: 73728 (72 KB) Non DFS Used: 3396104192 (3.16 GB) DFS Remaining: 14510510080 (13.51 GB) DFS Used%: 0.00% DFS Remaining%: 76.82% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Sun Sep 24 06:23:22 PDT 2017 Last Block Report: Sun Sep 24 06:21:10 PDT 2017 Name: 192.168.124.130:9866 (slave2) Hostname: ubuntu Decommission Status : Normal Configured Capacity: 18889830400 (17.59 GB) DFS Used: 126678 (123.71 KB) Non DFS Used: 3396108586 (3.16 GB) DFS Remaining: 14510452736 (13.51 GB) DFS Used%: 0.00% DFS Remaining%: 76.82% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Sun Sep 24 06:23:22 PDT 2017 Last Block Report: Sun Sep 24 06:21:07 PDT 2017
-
创建 hdfs 中的目录结构:
f@ubuntu:~$ /usr/local/hadoop/bin/hdfs dfs -mkdir /user f@ubuntu:~$ /usr/local/hadoop/bin/hdfs dfs -mkdir /user/f
-
将输入文件到分布式文件系统中去:
f@ubuntu:~$ /usr/local/hadoop/bin/hdfs dfs -mkdir input f@ubuntu:~$ /usr/local/hadoop/bin/hdfs dfs -put /usr/local/hadoop/etc/hadoop/*.xml input
-
在 master 节点跑一个数单词个数的测试程序:
f@ubuntu:~$ /usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-alpha4.jar wordcount input output
-
查看输出结果:
f@ubuntu:~$ /usr/local/hadoop/bin/hdfs dfs -cat output/*
-
得到输出结果:
"*" 18 "AS 9 "License"); 9 "alice,bob 18 "clumping" 1 (ASF) 1 (root 1 (the 9 --> 18 -1 1 0.0 1 1-MAX_INT. 1 1. 1 1.0. 1 2.0 9 40 2 40+20=60 1 <!-- 18 </configuration> 9 </description> 29 </property> 50 <?xml 8 <?xml-stylesheet 4 <configuration> 9 <description> 28 <description>ACL 21 ...
-
在 master 节点访问资源管理器
http://127.0.0.1:8088/cluster/nodes
,可以看到所有节点的信息:![Screen Shot 2017-09-24 at 9.37.02 PM](/Users/feng/Desktop/Screen Shot 2017-09-24 at 9.37.02 PM.png)
-
在 master 节点访问 namenode 信息
http://127.0.0.1:9870/dfshealth.html#tab-overview
,可以看到:![Screen Shot 2017-09-24 at 9.44.40 PM](/Users/feng/Desktop/Screen Shot 2017-09-24 at 9.44.40 PM.png)