export KAFKA_CLIENT_KERBEROS_PARAMS="-Djava.security.auth.login.config=/usr/hdp/current/kafka-broker/config/kafka_client_jaas.conf"
Use this jaas conf file
export KAFKA_CLIENT_KERBEROS_PARAMS="-Djava.security.auth.login.config=/usr/hdp/current/kafka-broker/config/kafka_client_jaas.conf"
Use this jaas conf file
| import java.io.{File, FileFilter} | |
| import scala.collection.mutable.HashMap | |
| val hadoopConfFiles = new HashMap[String, File]() | |
| sys.env.get("SPARK_CONF_DIR").foreach { localConfDir => | |
| println("localConfDir : " + localConfDir) | |
| val dir = new File(localConfDir) | |
| if (dir.isDirectory) { | |
| val files = dir.listFiles(new FileFilter { | |
| override def accept(pathname: File): Boolean = { |
There are many instances, where Spark event log size grow very high, especially in the case of streaming jobs and it is difficult to transfer such a big file to another small cluster for offline analysis. Following shell script will help you to reduce the spark event log size by excluding old jobs from the event log file, so that you still can analyze issues with recent jobs.
After running this shell script on a Linux/Mac terminal, a trimmed output will be saved in the input folder with an extension _trimmed and you have to use that file for further analysis.
Usage instructions:
trimsparkeventlog.shimport requests
import html
import json
# Define the Texgen API endpoint
HOST = 'cmlllm-textgenuiurl'
URI = f'https://{HOST}/api/v1/chat'You can connect to the Apache Livy Thrift Server using the Beeline client that is included with Apache Hive.
The Livy Thrift Server is disabled by default.
a) To enable Livy Thrift Server (livy.server.thrift.enabled), from CM , enable by checking the box labeled Enable Livy Thrift Server
b) To use hive catalog, enable HMS Service from livy CM conf
| cd /usr/hdp/current/kafka-broker/bin/ | |
| [kafka@c220-node2 bin]$ ./kafka-topics.sh --create --zookeeper c220-node2.squadron-labs.com:2181 --replication-factor 2 --partitions 3 --topic source1 | |
| Created topic "source1". | |
| [kafka@c220-node2 bin]$ ./kafka-topics.sh --create --zookeeper c220-node2.squadron-labs.com:2181 --replication-factor 2 --partitions 3 --topic dest1 | |
| Created topic "dest1". | |
| [kafka@c220-node2 bin]$ ./kafka-console-producer.sh --broker-list c220-node2.squadron-labs.com:6667 --topic source1 | |
| [kafka@c220-node4 bin]$ ./kafka-console-consumer.sh --bootstrap-server c220-node4.squadron-labs.com:6667 --topic dest1 | |
| Please try the following steps to test HWC read and write from Oozie | |
| Step 1 : | |
| in hive , login as hive user | |
| ----------------- | |
| create database db_hwc_test; | |
| use db_hwc_test; | |
| CREATE TABLE demo_input_table ( | |
| id int, | |
| name varchar(10) ) |
Kerberized Cluster
Enable hive interactive server in hive
Get following details from hive for spark or try this HWC Quick Test Script
mkdir -p ~/mytools/yarn && cd ~/mytools/yarn
wget https://raw.githubusercontent.com/vinodkc/myscripts/main/yarn-extract-logs.py .
python yarn-extract-logs.py <fill path to yarn aplication log> <name of new outputh directory>