Skip to content

Instantly share code, notes, and snippets.

View rajkrrsingh's full-sized avatar

Rajkumar Singh rajkrrsingh

View GitHub Profile
@rajkrrsingh
rajkrrsingh / Druid_Batch_Mode_Ingestion.md
Last active July 1, 2018 18:23
quick-start guide to ingest data into druid using batch mode on HDP platform.

source : http://druid.io/docs/latest/tutorials/tutorial-batch.html ENV : HDP-2.6.4

pageview.json

{"time": "2015-09-01T00:00:00Z", "url": "/foo/bar", "user": "alice", "latencyMs": 32}
{"time": "2015-09-01T01:00:00Z", "url": "/", "user": "bob", "latencyMs": 11}
{"time": "2015-09-01T01:30:00Z", "url": "/foo/bar", "user": "bob", "latencyMs": 45}

index task json

@rajkrrsingh
rajkrrsingh / hive_druid_integration.md
Last active July 4, 2019 10:44
hive druid integration : quick test to create druid table from hive table
generate data for hive table
echo "generating sample data for hive table"
echo {-1..-181451}hours | xargs -n1 date +"%Y-%m-%d %H:%M:%S" -d >> /tmp/dates.data
echo {-1..-18145}minutes | xargs -n1 date +"%Y-%m-%d %H:%M:%S" -d >> /tmp/dates.data
echo {-1..-1825}days | xargs -n1 date +"%Y-%m-%d %H:%M:%S" -d >> /tmp/dates.data
cat /tmp/dates.data | while read LINE ; do echo $LINE,"user"$((1 + RANDOM % 10000)),$((1 + RANDOM % 1000)) >> /tmp/hive_user_table.data; done

create hive table

@rajkrrsingh
rajkrrsingh / Hive_Compaction_Failing.md
Created June 27, 2018 23:56
hive compaction failing with FileAlreadyExistsException

ENV

HDP263

Exception

Client
ERROR [Thread-123]: compactor.Worker (Worker.java:run(191)) - Caught exception while trying to compact id:123,dbname:hive_acid,tableName:hive_acid_table,partName:hive_acid_part=part_name,state:^@,type:MAJOR,properties:null,runAs:null,tooManyAborts:false,highestTxnId:0.  Marking failed to avoid repeated failures,    java.io.IOException: Minor compactor job failed for Hadoop JobId:job_XXXXXX_XXXX     at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.launchCompactionJob(CompactorMR.java:314)
     at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.run(CompactorMR.java:269)
 at org.apache.hadoop.hive.ql.txn.compactor.Worker$1.run(Worker.java:175)
@rajkrrsingh
rajkrrsingh / Hive_Metastore_Event_listners.md
Created January 17, 2018 22:20
how to create own metastore event listner

Create DROP table listner which get triggered once the DROP table event happen

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hive.metastore.MetaStoreEventListener;
import org.apache.hadoop.hive.metastore.api.MetaException;
import org.apache.hadoop.hive.metastore.events.DropTableEvent;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class DropTableListner extends MetaStoreEventListener  {
@rajkrrsingh
rajkrrsingh / Druid_installation_HDP_2.6.2.md
Last active October 26, 2017 11:09
steps to install druid on hdp 2.6.2 over cento7

during installatioin I hit some issues where installtion is failing with variour reasons, I have document the some of the hurdle I faced and how I can overcome of those. there is some issue with superset installtion when you select the storage as sqllite so please select superset storage mysql or postgress in Ambari Installation wizard.

Issue1: Requires: openblas-devel

Druid Broker Installation failed with following exception:

resource_management.core.exceptions.ExecutionFailed: Execution of '/usr/bin/yum -d 0 -e 0 -y install superset_2_6_2_0_205' returned 1. Error: Package: superset_2_6_2_0_205-0.15.0.2.6.2.0-205.x86_64 (HDP-2.6)
           Requires: openblas-devel
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest
@rajkrrsingh
rajkrrsingh / Kafka-Metrics.md
Last active July 23, 2022 12:21
Monitoring Kafka Broker JMX using jolokia JVM Agent
Download Jolokia JVM Agent from following location
https://jolokia.org/download.html
wget http://search.maven.org/remotecontent?filepath=org/jolokia/jolokia-jvm/1.3.7/jolokia-jvm-1.3.7-agent.jar

mv jolokia-jvm-1.3.7-agent.jar agent.jar
here is the small shell script to get metrics you are intersted in
@rajkrrsingh
rajkrrsingh / Kafka-MirrorMaker-Set-Up.md
Created September 5, 2017 09:14
Kafka Mirror Maker - from source non-kerberized cluster to kerberized cluster

Kafka Mirror Maker - from source non-kerberized cluster to target (kerberized) cluster

Env:

source cluster:
HDP242
un-secure
hostname: rksnode1

destination cluster:
@rajkrrsingh
rajkrrsingh / Custom_UDF_with_LLAP.md
Last active January 25, 2018 20:44
steps to add custom udf to LLAP

Creating and running Temporary functions are discouraged while running query on LLAP because of security reason, since many users are sharing same instances of LLAP, it can create a conflict but still you can create temp functions using add jar and hive.llap.execution.mode=auto.

with exculsive llap execution mode(hive.llap.execution.mode=only) you will run into the ClassNotFoundException, hive.llap.execution.mode=auto will allow some part of query(map tasks) to run in the tez container.

Here are steps to create custom permanent funtion in LLAP(steps are tested on HDP-260)

  1. create a jar for UDF funtion (in this case I am using simple udf):
git clone https://github.com/rajkrrsingh/SampleCode
@rajkrrsingh
rajkrrsingh / SparkKafkaIntegration.md
Last active December 18, 2019 09:25
Spark Kafka Consumer in secure( Kerberos) enviornment

Sample Application

using direct stream
 import kafka.serializer.StringDecoder;
 import org.apache.spark.SparkConf
 import org.apache.spark.streaming._
 import org.apache.spark.streaming.kafka._
 
 
 object SparkKafkaConsumer2 {
@rajkrrsingh
rajkrrsingh / Spark LLAP Setup for Thrift server.md
Last active May 27, 2017 15:51
configuration required to setup Spark-LLAP

ENV HDP-2.6.0.3-8

Download spark-llap assembly jar from http://repo.hortonworks.com/content/repositories/releases/com/hortonworks/spark-llap/

Add following in Custom spark-thrift-sparkconf

spark_thrift_cmd_opts --jars /usr/hdp/current/spark-client/lib/spark-llap-1.0.0.2.6.0.3-8-assembly.jar
spark.executor.extraClassPath /usr/hdp/current/spark-client/lib/spark-llap-1.0.0.2.6.0.3-8-assembly.jar
spark.hadoop.hive.llap.daemon.service.hosts @llap0
spark.jars /usr/hdp/current/spark-client/lib/spark-llap-1.0.0.2.6.0.3-8-assembly.jar