Skip to content

Instantly share code, notes, and snippets.

View rajkrrsingh's full-sized avatar

Rajkumar Singh rajkrrsingh

View GitHub Profile
@rajkrrsingh
rajkrrsingh / kafkaproducerscala
Created December 13, 2016 18:20
kafka simple producer using scala
mkdir kafkaproducerscala
cd kafkaproducerscala/
mkdir -p src/main/scala
cd src/main/scala
vim KafkaProducerScala.scala
object KafkaProducerScala extends App {
import java.util.Properties
@rajkrrsingh
rajkrrsingh / Storm_AutoHDFS_configuration.md
Last active May 18, 2017 12:07
steps to configure storm AutoHDFS configuration.

Add these configuration to custom storm-site.

nimbus.autocredential.plugins.classes ["org.apache.storm.hdfs.common.security.AutoHDFS"]
nimbus.credential.renewers.classes ["org.apache.storm.hdfs.common.security.AutoHDFS"]
hdfs.keytab.file  /etc/security/keytabs/hdfs.headless.keytab
hdfs.kerberos.principal hdfs-s253_kerb@LAB.HORTONWORKS.NET
nimbus.credential.renewers.freq.secs 518400

nimbus.childopts -Xmx1024m _JAAS_PLACEHOLDER -javaagent:/usr/hdp/current/storm-nimbus/contrib/storm-jmxetric/lib/jmxetric-1.0.4.jar=host=localhost,port=8649,wireformat31x=true,mode=multicast,config=/usr/hdp/current/storm-nimbus/contrib/storm-jmxetric/conf/jmxetric-conf.xml:/etc/hadoop/conf/hdfs-site.xml:/etc/hadoop/conf/core-site.xml:/etc/hbase/conf/hbase-site.xml,process=Nimbus_JVM
@rajkrrsingh
rajkrrsingh / kerberos_installation_on_hdp_centos7.md
Last active May 21, 2017 14:25
sample steps to setup kdc before installing kerberos through ambari on hortonworks cluster

ENV

#### OS centos7
#### REALM EXAMPLE.COM (update accordingly)
#### AS and KDC are running on hostname rks253secure.hdp.local (update accordingly)

install required packages

yum install -y krb5-server krb5-workstation pam_krb5
cd  /var/kerberos/krb5kdc
@rajkrrsingh
rajkrrsingh / Spark LLAP Setup for Thrift server.md
Last active May 27, 2017 15:51
configuration required to setup Spark-LLAP

ENV HDP-2.6.0.3-8

Download spark-llap assembly jar from http://repo.hortonworks.com/content/repositories/releases/com/hortonworks/spark-llap/

Add following in Custom spark-thrift-sparkconf

spark_thrift_cmd_opts --jars /usr/hdp/current/spark-client/lib/spark-llap-1.0.0.2.6.0.3-8-assembly.jar
spark.executor.extraClassPath /usr/hdp/current/spark-client/lib/spark-llap-1.0.0.2.6.0.3-8-assembly.jar
spark.hadoop.hive.llap.daemon.service.hosts @llap0
spark.jars /usr/hdp/current/spark-client/lib/spark-llap-1.0.0.2.6.0.3-8-assembly.jar
@rajkrrsingh
rajkrrsingh / kafka-python-producer
Last active September 11, 2017 09:32
sample kafka producer using python
yum install -y python-pip
pip install kafka-python
//kafka producer sample code
vim kafka_producer.py
from kafka import KafkaProducer
from kafka.errors import KafkaError
producer = KafkaProducer(bootstrap_servers=['rkk1.hdp.local:6667'])
topic = "kafkatopic"
@rajkrrsingh
rajkrrsingh / KafkaProducerWithCallBack
Created December 10, 2016 04:24
sample kafka async producer demonstrating callback
import com.google.common.io.Resources;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;
import org.apache.log4j.Logger;
import java.io.IOException;
import java.io.InputStream;
import java.util.Properties;
@rajkrrsingh
rajkrrsingh / Druid_installation_HDP_2.6.2.md
Last active October 26, 2017 11:09
steps to install druid on hdp 2.6.2 over cento7

during installatioin I hit some issues where installtion is failing with variour reasons, I have document the some of the hurdle I faced and how I can overcome of those. there is some issue with superset installtion when you select the storage as sqllite so please select superset storage mysql or postgress in Ambari Installation wizard.

Issue1: Requires: openblas-devel

Druid Broker Installation failed with following exception:

resource_management.core.exceptions.ExecutionFailed: Execution of '/usr/bin/yum -d 0 -e 0 -y install superset_2_6_2_0_205' returned 1. Error: Package: superset_2_6_2_0_205-0.15.0.2.6.2.0-205.x86_64 (HDP-2.6)
           Requires: openblas-devel
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest
@rajkrrsingh
rajkrrsingh / SparkDFJoinUsingBroadcast
Created December 10, 2016 12:28
Apache Spark sample program to join two hive table using Broadcast variable
// Hive Tables
hive> select * from customer;
OK
1 Ramesh 32 Ahmedabad 000
2 Khilan 25 Delhi 1500
3 kaushik 23 Kota 2000
4 Chaitali 25 Mumbai 6500
5 Hardik 27 Bhopal 8500
6 Komal 22 MP 4500
Time taken: 0.568 seconds, Fetched: 6 row(s)
@rajkrrsingh
rajkrrsingh / Custom_UDF_with_LLAP.md
Last active January 25, 2018 20:44
steps to add custom udf to LLAP

Creating and running Temporary functions are discouraged while running query on LLAP because of security reason, since many users are sharing same instances of LLAP, it can create a conflict but still you can create temp functions using add jar and hive.llap.execution.mode=auto.

with exculsive llap execution mode(hive.llap.execution.mode=only) you will run into the ClassNotFoundException, hive.llap.execution.mode=auto will allow some part of query(map tasks) to run in the tez container.

Here are steps to create custom permanent funtion in LLAP(steps are tested on HDP-260)

  1. create a jar for UDF funtion (in this case I am using simple udf):
git clone https://github.com/rajkrrsingh/SampleCode
@rajkrrsingh
rajkrrsingh / Tez-Split-Calculation.md
Last active January 31, 2018 17:48
how tez initial paralleism work (split calculation)

split generation in tez

2017-02-16 15:56:48,725 [INFO] [InputInitializer {Map 1} #0] |dag.RootInputInitializerManager|: Starting InputInitializer for Input: sample_07 on vertex vertex_1486830296338_0025_1_00 [Map 1]

invoke org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator#initialize

2017-02-16 15:56:48,729 [INFO] [InputInitializer {Map 1} #0] |tez.HiveSplitGenerator|: InputInitializer {Map 1} #0 | initialize realInputFormatName : org.apache.hadoop.hive.ql.io.HiveInputFormat

2017-02-16 15:56:48,738 [INFO] [InputInitializer {Map 1} #0] |tez.HiveSplitGenerator|: InputInitializer {Map 1} #0 | initialize inputFormat org.apache.hadoop.hive.ql.io.HiveInputFormat@293c29b7