Skip to content

Instantly share code, notes, and snippets.

@abajwa-hw
Last active May 10, 2016 14:55
Show Gist options
  • Save abajwa-hw/cb3f6e9ddaac243dcba6 to your computer and use it in GitHub Desktop.
Save abajwa-hw/cb3f6e9ddaac243dcba6 to your computer and use it in GitHub Desktop.
Sql on Hadoop workshop

LAB

This lab is part of a 'Sql on Hadoop' webinar. The recording and slides can be found here

Purpose

How/when to use Hive vs Phoenix vs SparkSQL

Steps

Start sandbox

  • Download HDP 2.3 sandbox from here
  • After it boots up, find the IP address of the VM and add an entry into your machines hosts file e.g.
192.168.191.241 sandbox.hortonworks.com sandbox    
  • Connect to the VM via SSH (root/hadoop), correct the /etc/hosts entry
ssh root@sandbox.hortonworks.com

Install components

  • Install bits for Zeppelin and Nifi Ambari services
VERSION=`hdp-select status hadoop-client | sed 's/hadoop-client - \([0-9]\.[0-9]\).*/\1/'`
sudo git clone https://github.com/abajwa-hw/ambari-nifi-service.git   /var/lib/ambari-server/resources/stacks/HDP/$VERSION/services/NIFI
sudo git clone https://github.com/hortonworks-gallery/ambari-zeppelin-service.git   /var/lib/ambari-server/resources/stacks/HDP/$VERSION/services/ZEPPELIN   
  • Restart Ambari
#sandbox
service ambari restart

#non sandbox
sudo service ambari-server restart
  • Click Add services wizard and follow wizard to install both services with default settings
    • For more details on each see here and here

Configuration changes

  • Change below in Hive config and restart Hive
    • under General -> change hive.tez.java.opts to -server -Xmx1000m -Djava.net.preferIPv4Stack=true
    • under General -> ensure hive.exec.dynamic.partition.mode = nonstrict

(Optional) Install and start Solr

  • On sandbox, solr is installed as part of HDPsearch. Run below to fix a bug with solr setup on sandbox
chown -R solr:solr /opt/lucidworks-hdpsearch/solr  #current sandbox version has files owned by root here which causes problems
  • If running on an Ambari installed HDP 2.3 cluster (instead of sandbox), run the below to install HDPsearch first
yum install -y lucidworks-hdpsearch
sudo -u hdfs hadoop fs -mkdir /user/solr
sudo -u hdfs hadoop fs -chown solr /user/solr
  • Setup Banana snd Solr configs
su solr

#setup banana dasboard
cd /opt/lucidworks-hdpsearch/solr/server/solr-webapp/webapp/banana/app/dashboards/
mv default.json default.json.orig
wget https://raw.githubusercontent.com/abajwa-hw/ambari-nifi-service/master/demofiles/default.json
  • Edit solrconfig.xml by adding <str>EEE MMM d HH:mm:ss Z yyyy</str> under ParseDateFieldUpdateProcessorFactory so it looks like below. This is done to allow Solr to recognize the timestamp format of tweets.
vi /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs/conf/solrconfig.xml
  <processor>
    <arr name="format">
      <str>EEE MMM d HH:mm:ss Z yyyy</str>
  • Start Solr in cloud mode and create tweets collection
/opt/lucidworks-hdpsearch/solr/bin/solr start -c -z localhost:2181

/opt/lucidworks-hdpsearch/solr/bin/solr create -c tweets \
   -d data_driven_schema_configs \
   -s 1 \
   -rf 1 

Solr setup is complete. Return to root user

exit

Use Nifi to collect tweets

  • Fix time on sandbox VM
yum install -y ntp
service ntpd stop
ntpdate pool.ntp.org
service ntpd start
  • Download Nifi template called Twitter_Dashboard.xml onto your laptop's local filesystem
wget https://raw.githubusercontent.com/abajwa-hw/ambari-nifi-service/master/demofiles/Twitter_Dashboard.xml

Use Zeppelin to execute Hive, Phoenix, SparkSQL

  • Install the zeppelin notebook for the workshop by unzipping the below under /opt/incubator-zeppelin/notebook/ and restarting Zeppelin
su zeppelin
cd /opt/incubator-zeppelin/notebook/
wget https://www.dropbox.com/s/dxjc0ugj4lhurcf/2AY3B5WDV.zip
unzip 2AY3B5WDV.zip
  • Follow the instructions in the notebook to use zeppelin to execute reports using Hive or Phoenix or SparkSQL depending on the scenario/workload.
    • Make sure Hive and Hbase are started. If running on non-sandbox cluster, you will need to enable Phoenix first as well.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment