Skip to content

Instantly share code, notes, and snippets.

@vinodkc
Last active June 17, 2021 08:36
Show Gist options
  • Save vinodkc/d6410a2f627d26bd89827f0533ea6d32 to your computer and use it in GitHub Desktop.
Save vinodkc/d6410a2f627d26bd89827f0533ea6d32 to your computer and use it in GitHub Desktop.

CDP Livy ThriftServer Example

You can connect to the Apache Livy Thrift Server using the Beeline client that is included with Apache Hive.

The Livy Thrift Server is disabled by default.

a) To enable Livy Thrift Server (livy.server.thrift.enabled), from CM , enable by checking the box labeled Enable Livy Thrift Server

b) To use hive catalog, enable HMS Service from livy CM conf

c) Restart Livy server

Livy2 thrift server without HWC support

beeline -u "jdbc:hive2://c420-node4.coelab.cloudera.com:10090" -n hive
0: jdbc:hive2://c420-node4.coelab.cloudera.co> show tables;
Livy session has not yet started. Please wait for it to be ready...
+-----------+--------------------+--------------+
| database  |     tableName      | isTemporary  |
+-----------+--------------------+--------------+
| default   | employee           | false        |
| default   | employee_external  | false        |
+-----------+--------------------+--------------+

Note: To access Hive managed tables, you need to add HWC confs .

Livy2 Thriftserver with HWC support

Connect to livy Thrift server using beeline

beeline -u 'jdbc:hive2://c420-node4.coelab.cloudera.com:10090/default;?livy.session.conf.spark.jars=local:/opt/cloudera/parcels/CDH/jars/hive-warehouse-connector-assembly-1.0.0.7.1.5.0-257.jar;livy.session.conf.spark.security.credentials.hiveserver2.enabled=false;livy.session.conf.spark.datasource.hive.warehouse.read.via.llap=false;livy.session.conf.spark.datasource.hive.warehouse.read.jdbc.mode=cluster;livy.session.conf.spark.sql.hive.hiveserver2.jdbc.url=jdbc:hive2://c420-node3:10000/default;livy.session.conf.spark.datasource.hive.warehouse.user.name=hive;livy.session.conf.spark.sql.extensions=com.hortonworks.spark.sql.rule.Extensions' -n hive -p hive
0: jdbc:hive2://c420-node4.coelab.cloudera.co> show tables;
Livy session has not yet started. Please wait for it to be ready...
+-----------+--------------------+--------------+
| database  |     tableName      | isTemporary  |
+-----------+--------------------+--------------+
| default   | employee           | false        |
| default   | employee_external  | false        |
+-----------+--------------------+--------------+

 
 0: jdbc:hive2://c420-node4.coelab.cloudera.co> select * from employee;
+-------+--------------+---------+--------------------+
|  eid  |     name     | salary  |    designation     |
+-------+--------------+---------+--------------------+
| 1201  | Gopal        | 45000   | Technical manager  |
| 1202  | Manisha      | 45000   | Proof reader       |
| 1203  | Masthanvali  | 40000   | Technical writer   |
| 1204  | Kiran        | 40000   | Hr Admin           |
| 1205  | Kranthi      | 30000   | Op Admin           |
+-------+--------------+---------+--------------------+

Note :

  • livy.session.conf.spark.sql.extensions=com.hortonworks.spark.sql.rule.Extensions will automatically create HWC Session in spark
  • To configure Spark setting , prefix livy.session.conf. with Spark conf eg : To set spark.sql.extensions add livy.session.conf.spark.sql.extensions

Livy Thrift server (Livy TS) provides functionality similar to HS2 for running SQL directly in Spark. Thrift endpoint for JDBC/ODBC clients. Supports the same protocol as Hive. Supports Kerberos authentication and TLS. Supports user impersonation: queries run in Spark as the authenticated user. Authorization on data the same as granted to Spark users. Scales to many concurrent connections but also supports throttling.

User (JDBC client) connects to Livy TS. Livy TS uses interactive Livy session to execute SQL statements. User can specify session to use. If none specified, a new interactive session is created. (Each interactive session corresponds to a Spark application running as the user.) Note: Even though multiple Thrift sessions can use the same interactive session (and thus Spark application), each uses its own SparkSession. User can interact with the interactive session using the REST API.

Troubleshooting

As the TS is not a separate process but runs in the Livy server, TS logging is in the Livy server log. Enable DEBUG logging for org.apache.livy.thriftserver package in case of issues. At present, there are no additional metrics or monitoring for Livy.

Connecting to the TS

JDBC URL: Using binary mode jdbc:hive2://<host>:<port>/<db>

Using http mode jdbc:hive2://<host>:<port>/<db>;transportMode=http;httpPath=cliservice

Using binary mode with Kerberos jdbc:hive2://<host>:<port>/<db>;principal=livy/<host>@<REALM>

Using http mode with Kerberos jdbc:hive2://<host>:<port>/<db>;principal=HTTP/<host>@<REALM>;transportMode=http;httpPath=cliservice

Ref :

  1. https://docs.cloudera.com/runtime/7.1.0/running-spark-applications/topics/spark-livy-configure-thrift-server.html
  2. https://docs.cloudera.com/runtime/7.1.0/running-spark-applications/topics/spark-connect-livy-thrift-server.html
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment