Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?

Spark HWC integration - HDP 3 Secure cluster

Prerequisites :

  • Kerberized Cluster

  • Enable hive interactive server in hive

  • Get following details from hive for spark or try this HWC Quick Test Script

spark.hadoop.hive.llap.daemon.service.hosts @llap0
spark.sql.hive.hiveserver2.jdbc.url  jdbc:hive2://c420-node2.squadron-labs.com:2181,c420-node3.squadron-labs.com:2181,c420-node4.squadron-labs.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive
spark.datasource.hive.warehouse.metastoreUri thrift://c420-node3.squadron-labs.com:9083

Basic testing :

  1. Create a table employee in hive and load some data

eg: Create table

CREATE TABLE IF NOT EXISTS employee ( eid int, name String, salary String, destination String)
COMMENT 'Employee details'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;

Load data data.txt file into hdfs

1201,Vinod,45000,Technical manager
1202,Manisha,45000,Proof reader
1203,Masthanvali,40000,Technical writer
1204,Kiran,40000,Hr Admin
1205,Kranthi,30000,Op Admin
LOAD DATA INPATH '/tmp/data.txt' OVERWRITE INTO TABLE employee;
  1. kinit to the spark user and run
spark-shell --master yarn --conf "spark.security.credentials.hiveserver2.enabled=false" --conf "spark.sql.hive.hiveserver2.jdbc.url=jdbc:hive2://c420-node2.squadron-labs.com:2181,c420-node3.squadron-labs.com:2181,c420-node4.squadron-labs.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive;principal=hive/_HOST@HWX.COM" --conf "spark.datasource.hive.warehouse.metastoreUri=thrift://c420-node3.squadron-labs.com:9083" --conf "spark.datasource.hive.warehouse.load.staging.dir=/tmp/" --conf "spark.hadoop.hive.llap.daemon.service.hosts=@llap0" --conf "spark.hadoop.hive.zookeeper.quorum=c420-node2.squadron-labs.com:2181,c420-node3.squadron-labs.com:2181,c420-node4.squadron-labs.com:2181" --jars /usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.0.1.0-187.jar

Note: spark.security.credentials.hiveserver2.enabled should be set to false for YARN client deploy mode, and true for YARN cluster deploy mode (by default). This configuration is required for a Kerberized cluster

  1. run following code in scala shell to view the table data
import com.hortonworks.hwc.HiveWarehouseSession
val hive = HiveWarehouseSession.session(spark).build()
hive.execute("show tables").show
hive.executeQuery("select * from employee").show
  1. To apply common properties by default, add following setting into ambari spark2 custom conf
spark.sql.hive.hiveserver2.jdbc.url=jdbc:hive2://c420-node2.squadron-labs.com:2181,c420-node3.squadron-labs.com:2181,c420-node4.squadron-labs.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive;principal=hive/_HOST@HWX.COM
spark.datasource.hive.warehouse.metastoreUri=thrift://c420-node3.squadron-labs.com:9083
spark.datasource.hive.warehouse.load.staging.dir=/tmp/
spark.hadoop.hive.llap.daemon.service.hosts=@llap0
spark.hadoop.hive.zookeeper.quorum=c420-node2.squadron-labs.com:2181,c420-node3.squadron-labs.com:2181,c420-node4.squadron-labs.com:2181
  1. Run Spark-shell
spark-shell --master yarn  --conf "spark.security.credentials.hiveserver2.enabled=false" --jars /usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.0.1.0-187.jar

Note: Common properties are read from spark default properties

Pyspark example :

pyspark --master yarn --jars /usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.0.1.0-187.jar  --py-files  /usr/hdp/current/hive_warehouse_connector/pyspark_hwc-1.0.0.3.0.1.0-187.zip --conf spark.security.credentials.hiveserver2.enabled=false

Paste this code on shell

from pyspark_llap.sql.session import HiveWarehouseSession
hive = HiveWarehouseSession.session(spark).build()
  1. run following code in scala shell to view the hive table data
import com.hortonworks.hwc.HiveWarehouseSession
val hive = HiveWarehouseSession.session(spark).build()
hive.execute("show tables").show
hive.executeQuery("select * from employee").show
  1. To integrate HWC in Livy2

a. add following property in Custom livy2-conf

livy.file.local-dir-whitelist=/usr/hdp/current/hive_warehouse_connector/

b. Add hive-site.xml to /usr/hdp/current/spark2-client/conf on all cluster nodes.

c. Ensure hadoop.proxyuser.hive.hosts=* exists in core-site.xml ; refer Custom core-site section in HDFS confs

d. Login to Zeppelin and in livy2 interpreter settings add following

livy.spark.hadoop.hive.llap.daemon.service.hosts	@llap0
livy.spark.security.credentials.hiveserver2.enabled	true
livy.spark.sql.hive.hiveserver2.jdbc.url	jdbc:hive2://c420-node2.squadron-labs.com:2181,c420-node3.squadron-labs.com:2181,c420-node4.squadron-labs.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive
livy.spark.sql.hive.hiveserver2.jdbc.url.principal	hive/_HOST@HWX.COM
livy.spark.yarn.security.credentials.hiveserver2.enabled	true
livy.spark.jars file:///usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.0.1.0-187.jar

Note: Ensure to change the version of hive-warehouse-connector-assembly to match your HWC version

d. Restart livy2 interpreter

e. in first paragraph add

%livy2
import com.hortonworks.hwc.HiveWarehouseSession
val hive = HiveWarehouseSession.session(spark).build()

f. in second paragraph add

%livy2
hive.executeQuery("select * from employee").show

Note: There is an Ambari defect: AMBARI-22801, which reset the proxy configs on keytab regenration/service addition. Please follow the step 7.c again in such scenarios

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.