Skip to content

Instantly share code, notes, and snippets.

@greenqy
Last active December 12, 2015 04:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save greenqy/32ef059fb589c8e491e3 to your computer and use it in GitHub Desktop.
Save greenqy/32ef059fb589c8e491e3 to your computer and use it in GitHub Desktop.
create-hadoop-cluster-from-vms

启动集群

启动 hadoop 相关服务

启动 hadoop

start-dfs.sh
start-yarn.sh

启动 hbase

start-hbase.sh

启动 hive hcatalog

rm /hadoop/hive/hcatalog/var/log/hcat.pid
/hadoop/hive/hcatalog/sbin/hcat_server.sh start

启动 spark

/hadoop/spark-1.4.0/sbin/start-all.sh

启动工作流平台

启动 azkaban

cd /hadoop/azkaban/azkaban-exec-server
bin/azkaban-executor-start.sh

cd /hadoop/azkaban/azkaban-web-server
bin/azkaban-web-start.sh

启动 elasticsearch

cd /hadoop/elasticsearch
bin/elasticsearch

启动 session server

cd /home/ubuntu/packages/session-server
java -jar target/SessionServer-0.1.0.jar

启动 yellowbook

cd /home/ubuntu/packages/yellowbook
java -jar target/YellowBook-1.0-SNAPSHOT.jar

启动 workflow server

cd /home/ubuntu/packages/workflow
java -jar target/WorkflowServer-1.0.0.jar

启动数据立方体平台

## 登陆虚拟机 datanode02, 数据分析平台的安装在 /deploy 目录下
ssh datanode02

启动 CubeAnaly

cd /deploy/CubeAnaly
java -jar CubeAnaly-0.1.0.jar

启动 AnalyUI

cd /deploy/AnalyUI
nodejs server.js 9000

启动 AutoPublishServer

cd /deploy/AutoPublishServer
java -jar AutoPublishServer-0.1.0.jar

启动 ReportServer

cd /deploy/ReportServer
java -jar ReportServer-0.1.0.jar

利用拷贝虚拟机镜像来部署集群, 需要修改的配置文件包括:

hadoop

  • /etc/hosts 根据集群规模来配置, datanode也要保持同步

  • 配置文件, slaves 配置节点host

  • 删除 namenode 上的 /hadoop/hadoop/namespace, /hadoop/hadoop/tmp 文件夹

  • 删除所有 datanode 上的 /hadoop/hadoop/data, /hadoop/hadoop/tmp 文件夹

    $ stop-dfs.sh $ hdfs namenode -format $ start-dfs.sh $ start-yarn.sh

由于删除了数据, 所以原先的数据就都没有了

hive

$ $HIVE_HOME/hcatalog/sbin/hcat_server.sh start

执行 $ jps, 那个 RunJar 进程就是这个东西.

由于将原有的 hdfs 上的数据都删除了, 所以原先有的表的数据也不存在了.

载入演示数据:

$ cd /home/ubuntu/data/hive-tables
$ hive -e "load data local inpath 'words.txt' into table words"
$ hive -e "load data local inpath 'sample-data.txt' into table sample_data"

hbase

  • /hadoop/hbase/conf目录下的 regionservers, backup-masters根据集群规模配置host
  • /hadoop/hbase/conf/hbase-site.xml 文件中的hbase.zookeeper.quorum根据集群规模来配置
  • 同步下所有节点的配置

删除所有节点/hadoop/hbase 目录下的 zookeeper 目录

删除 hdfs 上的 /hbase 目录:

$ hadoop fs -rm -r /hbase
    $ stop-hbase.sh
    $ rm -rf zookeeper
    $ start-hbase.sh

重新操作初始化workflow中的hbase, session server中的相关操作, 详见.

spark

  • /hadoop/spark-1.4.0/conf/slaves.sh 根据集群规模修改host

    $ sbin/start-all.sh

azkaban

$ bin/azkaban-executor-start.sh
$ bin/azkaban-web-start.sh

elastic search

$ bin/elasticsearch

session-server

  • config/application.properties 中的 hbase.zookeeper.quorum 修改成对应的集群规模

yellowbook, workflow

yellowbook

  • session.properties
  • application.properties

ui

Settings.js, 修改成ip, 把浏览器的缓存去掉

define(
function () {
    return {
        API_HOST: "http://10.10.245.219:8472",
        LOG_HOST: "http://10.10.245.219:8473"
    };
})

start script

hadoop

start-dfs.sh start-yarn.sh

hbase

start-hbase.sh

hive hcatalog

/hadoop/hive/hcatalog/sbin/hcat_server.sh stop /hadoop/hive/hcatalog/sbin/hcat_server.sh start

spark

/hadoop/spark-1.4.0/sbin/start-all.sh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment