Skip to content

Instantly share code, notes, and snippets.

@abajwa-hw
Last active July 20, 2016 18:36
Show Gist options
  • Save abajwa-hw/be0480e610026a05c48ecb1d8f3443ef to your computer and use it in GitHub Desktop.
Save abajwa-hw/be0480e610026a05c48ecb1d8f3443ef to your computer and use it in GitHub Desktop.
Install HDB

Add HDB (HAWQ) to HDP 2.4.2 with Zeppelin

Goals:

  • Install a 4 node cluster running HDP 2.4.2 using Ambari 2.2.2.0 (including Zeppelin and HDB) using Ambari bootstrap via blueprints or Ambari install wizard
  • Configure HAWQ for Zeppelin
  • Configure Zeppelin for HAWQ
  • Run HAWQ queries via Zeppelin

Notes:

  • HDB managed via Ambari is only supported from Ambri 2.2.2.0 onwards. Do not attempt using older versions of Ambari

Install Ambari 2.2.2.0 and HDB service definitions

  • Bring up 4 VMs imaged with RHEL/CentOS 6.x (e.g. node1-4 in this case)

  • On non-ambari nodes (nodes2-4 in this case), install ambari-agents and point them to ambari node (e.g. node1 in this case)

export ambari_server=node1
curl -sSL https://raw.githubusercontent.com/seanorama/ambari-bootstrap/master/ambari-bootstrap.sh | sudo -E sh
  • On Ambari node (e.g. node1), install ambari-server
export install_ambari_server=true
curl -sSL https://raw.githubusercontent.com/seanorama/ambari-bootstrap/master/ambari-bootstrap.sh | sudo -E sh
  • Install Zeppelin service definition
yum install -y git
git clone https://github.com/hortonworks-gallery/ambari-zeppelin-service.git /var/lib/ambari-server/resources/stacks/HDP/2.4/services/ZEPPELIN
sed -i.bak '/dependencies for all/a \    "ZEPPELIN_MASTER-START": ["NAMENODE-START", "DATANODE-START"],' /var/lib/ambari-server/resources/stacks/HDP/2.4/role_command_order.json
  • Install Pivotal service definition and repo per HDB doc

    • Create staging dir:
    mkdir /staging
    chmod a+rx /staging
    
    • Copy hdb-2.0.0.0-22126.tar.gz and hdb-ambari-plugin-2.0.0-448.tar.gz to /staging

    • Setup HDB repo and Ambari service definition:

    tar -xvzf /staging/hdb-2.0.0.0-*.tar.gz -C /staging/
    tar -xvzf /staging/hdb-ambari-plugin-2.0.0-*.tar.gz -C /staging/  
    yum install -y httpd
    service httpd start
    cd /staging/hdb*
    ./setup_repo.sh
    cd /staging/hdb-ambari-plugin*
    ./setup_repo.sh  
     yum install -y hdb-ambari-plugin
    
    • At this point you should see a local repo up at http://node1/HDB/

    • Restart Ambari so it now recognizes Zeppelin, HAWQ, PXF services

    service ambari-server restart
    service ambari-agent restart
    
  • Confirm 4 agents were registered and agent is up

curl -u admin:admin -H  X-Requested-By:ambari http://localhost:8080/api/v1/hosts
service ambari-agent status

Deploy vanilla HDP + Zeppelin + HDB

  • Deploy cluster running latest HDP including Zeppelin, HAWQ, PXF. You can either:
    • Option 1: login to Ambari UI and use Install Wizard. In this case:
      • You will need to set the 'HAWQ System User Password' to any value you like
      • Make sure to manually adjust the HDFS settings mentioned in HDB doc
      • Make sure that the port specified in 'HAWQ master port' (by default, 5432) is not in use on the host where you will install HAWQ master
        • If installing on single node or any other scenario where HAWQ master need to be installed on node where a postgres setup already exists (e.g. if installing HAWQ master on the same host where Ambari is installed) you will need to change the master port from default value (5432)
        • On single node setup, 'HAWQ standby master' will not be installed
      • Refer to HDB doc for full details
    • OR
    • Option 2: generate/deploy a customized blueprint using ambari-bootstrap that takes care of the HDFS configurations as below:
yum install -y python-argparse
cd
git clone https://github.com/seanorama/ambari-bootstrap.git

#decide which services to deploy and set the number of nodes in the cluster
export ambari_services="HDFS MAPREDUCE2 YARN ZOOKEEPER HIVE ZEPPELIN SPARKHAWQ PXF"
export host_count=4
 
cd ./ambari-bootstrap/deploy/

#add HDFS config customizations for HAWQ and any others you may want
cat << EOF > configuration-custom.json
{
  "configurations" : {
    "hdfs-site": {
        "dfs.allow.truncate": "true",
        "dfs.block.access.token.enable": "false",
        "dfs.block.local-path-access.user": "gpadmin",
        "dfs.client.read.shortcircuit": "true",
        "dfs.client.socket-timeout": "300000000",
        "dfs.client.use.legacy.blockreader.local": "false",
        "dfs.datanode.handler.count": "60",
        "dfs.datanode.socket.write.timeout": "7200000",                                
        "dfs.namenode.handler.count": "600",
        "dfs.support.append": "true"               
    },
    "hawq-env":{
        "hawq_password":"gpadmin"
      },
    "core-site": {
        "ipc.client.connection.maxidletime": "3600000",
        "ipc.client.connect.timeout": "300000",
        "ipc.server.listen.queue.size": "3300"
    }
  }
}
EOF

#optional - if you want to review the BP before deploying it
#export deploy=false
#./deploy-recommended-cluster.bash
#more temp*/blueprint.json

#generate BP including customizations and start cluster deployment
export deploy=true
./deploy-recommended-cluster.bash
  • This will kick off HDP cluster install, including Zeppelin, HAWQ and PXF. You can monitor it via Ambari at http://node1:8080

Configure HAWQ for Zeppelin

  • On HAWQ master node:
    • SSH in
    • connect to HAWQ
    • create a new DB
    • add a user for zeppelin
    • give access to the DB to zeppelin user
su - gpadmin
source /usr/local/hawq/greenplum_path.sh
export PGPORT=5432
psql -d postgres

create database contoso;
CREATE USER zeppelin WITH PASSWORD 'zeppelin';
GRANT ALL PRIVILEGES ON DATABASE contoso to zeppelin;
\q
  • Note: you only need to set PGPORT if HAWQ master was not installed on default port (5432). If you specified a different port, you will need to set this accordingly.

  • On HAWQ master node, run below to add the IP of zeppelin node to HAWQ pg_hba.conf conf. This is done to allow Zeppelin to access HAWQ from a different node

    • Make sure to replace 172.17.0.2 below with IP of host running Zeppelin
echo "host all all 172.17.0.2/32 trust" >> /data/hawq/master/pg_hba.conf
  • Restart HAWQ via Ambari

Configure Zeppelin for HAWQ

  • Open Zeppelin interpreter and scroll down to section for psql and make below changes to use zeppelin user to connect to contoso DB:
    • postgresql.url = jdbc:postgresql://node3:5432/contoso
    • postgresql.user = zeppelin
    • postgresql.password = zeppelin

Run HAWQ queries via Zeppelin

  • Create a new note in Zeppelin with below cells to create/populate a test table and calculate avg of subset:
%psql.sql
create table tt (i int);
insert into tt select generate_series(1,1000000);
%psql.sql
select avg(i) from tt where i>5000;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment