Skip to content

Instantly share code, notes, and snippets.

@greenqy
Created December 8, 2015 09:38
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save greenqy/9fd9e2b033d2a0a44f57 to your computer and use it in GitHub Desktop.
Save greenqy/9fd9e2b033d2a0a44f57 to your computer and use it in GitHub Desktop.
create-hadoop-cluster-from-vms
启动集群
---------
启动 hadoop 相关服务
==============
### 启动 hadoop
```bash
start-dfs.sh
start-yarn.sh
```
### 启动 hbase
```bash
start-hbase.sh
```
### 启动 hive hcatalog
```bash
rm /hadoop/hive/hcatalog/var/log/hcat.pid
/hadoop/hive/hcatalog/sbin/hcat_server.sh start
```
### 启动 spark
```bash
/hadoop/spark-1.4.0/sbin/start-all.sh
```
启动工作流平台
=============
### 启动 azkaban
```bash
cd /hadoop/azkaban/azkaban-exec-server
bin/azkaban-executor-start.sh
cd /hadoop/azkaban/azkaban-web-server
bin/azkaban-web-start.sh
```
### 启动 elasticsearch
```bash
cd /hadoop/elasticsearch
bin/elasticsearch
```
### 启动 session server
```bash
cd /home/ubuntu/packages/session-server
java -jar target/SessionServer-0.1.0.jar
```
### 启动 yellowbook
```bash
cd /home/ubuntu/packages/yellowbook
java -jar target/YellowBook-1.0-SNAPSHOT.jar
```
### 启动 workflow server
```bash
cd /home/ubuntu/packages/workflow
java -jar target/WorkflowServer-1.0.0.jar
```
启动数据立方体平台
===================
```bash
## 登陆虚拟机 datanode02, 数据分析平台的安装在 /deploy 目录下
ssh datanode02
```
### 启动 CubeAnaly
```bash
cd /deploy/CubeAnaly
java -jar CubeAnaly-0.1.0.jar
```
### 启动 AnalyUI
```bash
cd /deploy/AnalyUI
nodejs server.js 9000
```
### 启动 AutoPublishServer
```bash
cd /deploy/AutoPublishServer
java -jar AutoPublishServer-0.1.0.jar
```
### 启动 ReportServer
```bash
cd /deploy/ReportServer
java -jar ReportServer-0.1.0.jar
```
利用拷贝虚拟机镜像来部署集群, 需要修改的配置文件包括:
------------
### hadoop
* /etc/hosts 根据集群规模来配置, datanode也要保持同步
* 配置文件, slaves 配置节点host
* 删除 namenode 上的 /hadoop/hadoop/namespace, /hadoop/hadoop/tmp 文件夹
* 删除所有 datanode 上的 /hadoop/hadoop/data, /hadoop/hadoop/tmp 文件夹
$ stop-dfs.sh
$ hdfs namenode -format
$ start-dfs.sh
$ start-yarn.sh
由于删除了数据, 所以原先的数据就都没有了
### hive
$ $HIVE_HOME/hcatalog/sbin/hcat_server.sh start
执行 $ jps, 那个 RunJar 进程就是这个东西.
### hbase
* /hadoop/hbase/conf目录下的 regionservers, backup-masters根据集群规模配置host
* /hadoop/hbase/conf/hbase-site.xml 文件中的hbase.zookeeper.quorum根据集群规模来配置
* 同步下所有节点的配置
删除所有节点/hadoop/hbase 目录下的 zookeeper 目录
删除 hdfs 上的 /hbase 目录:
$ hadoop fs -rm -r /hbase
```bash
$ stop-hbase.sh
$ rm -rf zookeeper
$ start-hbase.sh
```
### spark
* /hadoop/spark-1.4.0/conf/slaves.sh 根据集群规模修改host
$ sbin/start-all.sh
### azkaban
$ bin/azkaban-executor-start.sh
$ bin/azkaban-web-start.sh
### elastic search
$ bin/elasticsearch
### session-server
* config/application.properties 中的 hbase.zookeeper.quorum 修改成对应的集群规模
### yellowbook, workflow
yellowbook
* session.properties
* application.properties
### ui
Settings.js, 修改成ip, 把浏览器的缓存去掉
start script
-----------
# hadoop
start-dfs.sh
start-yarn.sh
# hbase
start-hbase.sh
# hive hcatalog
/hadoop/hive/hcatalog/sbin/hcat_server.sh stop
/hadoop/hive/hcatalog/sbin/hcat_server.sh start
# spark
/hadoop/spark-1.4.0/sbin/start-all.sh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment