Skip to content

Instantly share code, notes, and snippets.

HWC-Oozie integration

hive-warehouse-connector jar released as part of HDP 3.1.5 has many third party jars embedded in it , which is conflicting with oozie, to solve that issue , you have to get the hwc dev jar or hotfix jar which does not have those conflicting classes Internal JIRAs to handle this issue : BUG-122013,BUG-122269.

eg :

199679223 2021-01-17 08:48  hive-warehouse-connector-assembly-1.0.0.3.1.5.0-152.jar // actual jar
56340621 2021-01-17 08:36   hive-warehouse-connector-assembly-1.0.0.3.1.5.0-152_dev.jar // dev jar
@rangareddy
rangareddy / script.sh
Created September 1, 2020 05:29 — forked from haisum/script.sh
comment and uncomment lines in bash script via sed
sed -i '/<pattern>/s/^/#/g' file #comment
sed -i '/<pattern>/s/^#//g' file #uncomment
@rangareddy
rangareddy / reading-property-from-file.sh
Created September 1, 2020 05:20 — forked from marcelbirkner/reading-property-from-file.sh
Read property from properties file within Shell Script
#!/bin/sh
PROPERTY_FILE=apps.properties
function getProperty {
PROP_KEY=$1
PROP_VALUE=`cat $PROPERTY_FILE | grep "$PROP_KEY" | cut -d'=' -f2`
echo $PROP_VALUE
}
@rangareddy
rangareddy / Kafka-MirrorMaker-Set-Up.md
Created May 19, 2020 11:18 — forked from rajkrrsingh/Kafka-MirrorMaker-Set-Up.md
Kafka Mirror Maker - from source non-kerberized cluster to kerberized cluster

Kafka Mirror Maker - from source non-kerberized cluster to target (kerberized) cluster

Env:

source cluster:
HDP242
un-secure
hostname: rksnode1

destination cluster:
@rangareddy
rangareddy / Hive Kafka Integration
Created May 19, 2020 11:17 — forked from rajkrrsingh/Hive Kafka Integration
a quick start guide to query kafka topic from hive table
#### ENV: HDP-3.1
#### Data setup:
```
cat sample-data.json
{"name": "Raj","address": {"a": "b","c": "d","e": "f"}}
{"name": "Raj1","address": {"a": "bb","c": "dd","e": "ff"}}
```
#### Create topic in Kafka and Ingest data into it.
package com.rajkrrsingh.zk;
import java.io.IOException;
import java.util.List;
import java.util.concurrent.CountDownLatch;
import org.apache.zookeeper.CreateMode;
import org.apache.zookeeper.KeeperException;
import org.apache.zookeeper.WatchedEvent;
import org.apache.zookeeper.Watcher;
@rangareddy
rangareddy / Create_Bulk_Topic_In_Kafka.md
Created May 19, 2020 11:13 — forked from rajkrrsingh/Create_Bulk_Topic_In_Kafka.md
create kafka bulk topic from admin client

import kafka.admin.AdminUtils;
import kafka.admin.RackAwareMode;
import kafka.utils.ZKStringSerializer$;
import kafka.utils.ZkUtils;
import org.I0Itec.zkclient.ZkClient;
import org.I0Itec.zkclient.ZkConnection;

import java.io.BufferedReader;
[root@kafka-a-01 /]# /opt/kafka_current/bin/kafka-console-consumer.sh
The console consumer is a tool that reads data from Kafka and outputs it to standard output.
Option Description
------ -----------
--blacklist <blacklist> Blacklist of topics to exclude from
consumption.
--bootstrap-server <server to connect REQUIRED (unless old consumer is
to> used): The server to connect to.
--consumer-property <consumer_prop> A mechanism to pass user-defined
properties in the form key=value to
#!/bin/bash
# Minimum TODOs on a per job basis:
# 1. define name, application jar path, main class, queue and log4j-yarn.properties path
# 2. remove properties not applicable to your Spark version (Spark 1.x vs. Spark 2.x)
# 3. tweak num_executors, executor_memory (+ overhead), and backpressure settings
# the two most important settings:
num_executors=6
executor_memory=3g
@rangareddy
rangareddy / kafka-cheat-sheet.md
Last active October 25, 2022 07:11 — forked from tombentley/kafka-cheat-sheet.md
Apache Kafka Cheat Sheet

Kafka Cheat Sheet

Display Topic Information

$ kafka-topics.sh --describe --zookeeper localhost:2181 --topic beacon
Topic:beacon	PartitionCount:6	ReplicationFactor:1	Configs:
	Topic: beacon	Partition: 0	Leader: 1	Replicas: 1	Isr: 1
	Topic: beacon	Partition: 1	Leader: 1	Replicas: 1	Isr: 1