Skip to content

Instantly share code, notes, and snippets.

@seanorama
Last active August 28, 2019 12:01
Show Gist options
  • Save seanorama/fafefaf684758d767d6e8430f96fea8d to your computer and use it in GitHub Desktop.
Save seanorama/fafefaf684758d767d6e8430f96fea8d to your computer and use it in GitHub Desktop.
spark-hive-and-hbase-warehouse-connectors

Spark Hive Warehouse Connector (HWC) and Spark HBase Connector (SHC)

For Zeppelin %livy2 and Spark

https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.1.0/using-zeppelin/content/using_spark_hwc_and_shc_client_jar_files_with_livy.html


Install dependencies:

On all hosts:

sudo yum install hive_warehouse_connector shc

Spark

This is not required for Zeppelin. Only if executing directly from Spark.

From Ambari, add to spark2-defaults:

spark.datasource.hive.warehouse.load.staging.dir=/tmp
spark.datasource.hive.warehouse.metastoreUri={{config['configurations']['hive-interactive-site']['hive.metastore.uris']}}
spark.hadoop.hive.llap.daemon.service.hosts={{config['configurations']['hive-interactive-site']['hive.llap.daemon.service.hosts']}}
spark.hadoop.hive.zookeeper.quorum={{config['configurations']['hive-site']['hive.zookeeper.quorum']}}
spark.history.ui.admin.acls=knox   
spark.security.credentials.hive.enabled=true
spark.security.credentials.hiveserver2.enabled=true
spark.sql.hive.hiveserver2.jdbc.url=...copy Services > Hive > Summary > HIVESERVER2 INTERACTIVE JDBC URL...
spark.sql.hive.hiveserver2.jdbc.url.principal={{config['configurations']['hive-site']['hive.server2.authentication.kerberos.principal']}}
spark.sql.hive.llap=true
spark.yarn.security.credentials.hiveserver2.enabled=true

Add to spark2-hive-site-override:

hive.llap.daemon.service.hosts={{config['configurations']['hive-interactive-site']['hive.llap.daemon.service.hosts']}}

Livy configuration

Required for Zeppelin+%livy2

From Ambari, add to livy2-conf:

livy.file.local-dir-whitelist=/usr/hdp/current/hive_warehouse_connector,/usr/hdp/current/hbase-client/lib,/usr/hdp/current/shc
livy.spark.security.credentials.hiveserver2.enabled=true 
livy.spark.sql.hive.hiveserver2.jdbc.url=jdbc:hive2://hivehost01.realm.tld:10000/ 
livy.spark.sql.hive.hiveserver2.jdbc.url.principal={{config['configurations']['hive-site']['hive.server2.authentication.kerberos.principal']}}
livy.spark.yarn.security.credentials.hiveserver2.enabled=true 
livy.superusers=zeppelin-clustername

Zeppelin Livy interpreter

From Zeppelin, update %livy interpreter settings with:

livy.spark.jars=file:///usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-*.jar
livy.spark.submit.pyFiles=file:///usr/hdp/current/hive_warehouse_connector/pyspark_hwc-*.zip
livy.spark.hadoop.hive.llap.daemon.service.hosts=@llap0
livy.spark.yarn.security.credentials.hiveserver2.enabled=true
livy.spark.security.credentials.hiveserver2.enabled=true
livy.spark.sql.hive.hiveserver2.jdbc.url=...copy Services > Hive > Summary > HIVESERVER2 INTERACTIVE JDBC URL...
livy.spark.sql.hive.hiveserver2.jdbc.url.principal=...copy hive-site:hive.server2.authentication.kerberos.principal...
livy.spark.hadoop.hive.zookeeper.quorum=...copy Services > Hive > Configs > Advanced hive-site > hive.zookeeper.quorum....
zeppelin.livy.ssl.trustStore=/etc/security/certificates/truststore.jks
zeppelin.livy.ssl.trustStorePassword=changeit

Test Zeppelin %livy2

%livy2

import com.hortonworks.hwc.HiveWarehouseSession
import com.hortonworks.hwc.HiveWarehouseSession._
val hive = HiveWarehouseSession.session(spark).build()

hive.showDatabases().show(100)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment