Vinod KC vinodkc

## gist:d134515b3fd42a0059bd476f0bd2bd39
import java.io.{File, FileFilter}
    import scala.collection.mutable.HashMap
    val hadoopConfFiles = new HashMap[String, File]()

    sys.env.get("SPARK_CONF_DIR").foreach { localConfDir =>
      println("localConfDir : " + localConfDir)
      val dir = new File(localConfDir)
      if (dir.isDirectory) {
        val files = dir.listFiles(new FileFilter {
          override def accept(pathname: File): Boolean = {

## CML TextGen REST API Client.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                vinodkc
                / CML TextGen REST API Client.md
            
            
              Last active
              October 11, 2023 21:58
            
          
    import requests
import html
import json

# Define the Texgen API endpoint
HOST = 'cmlllm-textgenuiurl'
URI = f'https://{HOST}/api/v1/chat'

  
## HWC-Oozie integration-Pyspark
Please try the following steps to test HWC read and write from Oozie
Step 1 :
in hive , login as hive user
-----------------
create database db_hwc_test;
use db_hwc_test;

CREATE TABLE demo_input_table (
   id int,
   name varchar(10) )

## CDP-Livy ThriftServer.md

      
              1 file
            
          
              2 forks
            
          
              0 comments
            
          
              0 stars
            
          
                vinodkc
                / CDP-Livy ThriftServer.md
            
            
              Last active
              June 17, 2021 08:36
            
          
    CDP Livy ThriftServer Example

You can connect to the Apache Livy Thrift Server using the Beeline client that is included with Apache Hive.
The Livy Thrift Server is disabled by default.
a) To enable Livy Thrift Server (livy.server.thrift.enabled), from CM , enable by checking the box labeled Enable Livy Thrift Server
b) To use hive catalog,  enable HMS Service from livy CM conf

  
## HWC-Oozie integration.md

      
              1 file
            
          
              2 forks
            
          
              0 comments
            
          
              0 stars
            
          
                vinodkc
                / HWC-Oozie integration.md
            
            
              Last active
              May 10, 2021 14:43
            
          
    HWC-Oozie integration

hive-warehouse-connector jar released as part of HDP 3.1.5 has many third party jars embedded in it , which is  conflicting with oozie, to solve that issue , you have to get the hwc dev jar or hotfix jar which does not have those conflicting classes Internal JIRAs to handle this issue : BUG-122013,BUG-122269.
eg :
199679223 2021-01-17 08:48  hive-warehouse-connector-assembly-1.0.0.3.1.5.0-152.jar // actual jar
56340621 2021-01-17 08:36   hive-warehouse-connector-assembly-1.0.0.3.1.5.0-152_dev.jar // dev jar

  
## HWC Quick Validation-Basic.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                vinodkc
                / HWC Quick Validation-Basic.md
            
            
              Last active
              December 8, 2020 16:22
            
          
    Step 1: Login to LLAP host node
step 2:
cd /tmp
wget https://raw.githubusercontent.com/dbompart/hive_warehouse_connector/master/hwc_info_collect.sh
chmod +x  hwc_info_collect.sh
./hwc_info_collect.sh

  
## YARN Application Log splitter.md

      
              1 file
            
          
              1 fork
            
          
              0 comments
            
          
              4 stars
            
          
                vinodkc
                / YARN Application Log splitter.md
            
            
              Last active
              May 13, 2021 13:28
            
          
    mkdir -p ~/mytools/yarn && cd  ~/mytools/yarn

wget https://raw.githubusercontent.com/vinodkc/myscripts/main/yarn-extract-logs.py .

python yarn-extract-logs.py  <fill path to yarn aplication log> <name of new outputh directory>

  
## SparkEventLogJobTrimmer.md

      
              1 file
            
          
              1 fork
            
          
              0 comments
            
          
              0 stars
            
          
                vinodkc
                / SparkEventLogJobTrimmer.md
            
            
              Last active
              November 2, 2023 18:52
            
          
    Spark Event Log Job Trimmer

There are many instances, where Spark event log size grow very high, especially in the case of streaming jobs and it is difficult to transfer such a big file to another small cluster for offline analysis. Following shell script will help you to reduce the spark event log size by excluding old jobs from the event log file, so that you still can analyze issues with recent jobs.
After running this shell script on a Linux/Mac terminal, a trimmed output will be saved in the input folder with an extension _trimmed and you have to use that file for further analysis.
Usage instructions:

Copy & paste below code snippet into a file trimsparkeventlog.sh


## Register hive built-in udf in spark sql.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                vinodkc
                / Register hive built-in udf in spark sql.md
            
            
              Last active
              November 15, 2020 13:56
            
          
    How to use hive builtin udf in spark sql

./spark-shell --jars /usr/hdp/current/hive-server2/lib/hive-exec.jar

val data = (1 to 10).toDF("col1").withColumn("col2",col("col1")).registerTempTable("table1")
spark.sql("CREATE TEMPORARY FUNCTION genericUDFAbsFromHive AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFAbs'")
sql("select genericUDFAbsFromHive(col1-2000) as absCol1,col2 from table1").show(false)

  
## Stor-Hive-setup steps.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                vinodkc
                / Stor-Hive-setup steps.md
            
            
              Last active
              November 15, 2020 13:57
            
          
    Storm-Hive-Integration - on HDP 3.1.0.0-78

Download https://mvnrepository.com/artifact/org.apache.storm/storm-hive-examples/1.2.1.3.1.0.0-78 from maven central or build it from https://github.com/hortonworks/storm/blob/HDP-3.1.0.0-78-tag/examples/storm-hive-examples
Note: No need to setup Kafka, as this demo topology simulates the input data from a local Spout.
We will try to save records with following fields into Hive table
{"id","name","phone","street","city","state"}
	import java.io.{File, FileFilter}
	import scala.collection.mutable.HashMap
	val hadoopConfFiles = new HashMap[String, File]()

	sys.env.get("SPARK_CONF_DIR").foreach { localConfDir =>
	println("localConfDir : " + localConfDir)
	val dir = new File(localConfDir)
	if (dir.isDirectory) {
	val files = dir.listFiles(new FileFilter {
	override def accept(pathname: File): Boolean = {
	Please try the following steps to test HWC read and write from Oozie
	Step 1 :
	in hive , login as hive user
	-----------------
	create database db_hwc_test;
	use db_hwc_test;

	CREATE TABLE demo_input_table (
	id int,
	name varchar(10) )