Rajkumar Singh rajkrrsingh

## kafkaproducerscala
mkdir kafkaproducerscala
cd kafkaproducerscala/
mkdir -p src/main/scala
cd src/main/scala
vim KafkaProducerScala.scala

object KafkaProducerScala extends App {

       import java.util.Properties


## Storm_AutoHDFS_configuration.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                rajkrrsingh
                / Storm_AutoHDFS_configuration.md
            
            
              Last active
              May 18, 2017 12:07
            
              
                steps to configure storm AutoHDFS configuration.
              
          
    Add these configuration to custom storm-site.

nimbus.autocredential.plugins.classes ["org.apache.storm.hdfs.common.security.AutoHDFS"]
nimbus.credential.renewers.classes ["org.apache.storm.hdfs.common.security.AutoHDFS"]
hdfs.keytab.file  /etc/security/keytabs/hdfs.headless.keytab
hdfs.kerberos.principal hdfs-s253_kerb@LAB.HORTONWORKS.NET
nimbus.credential.renewers.freq.secs 518400

nimbus.childopts -Xmx1024m _JAAS_PLACEHOLDER -javaagent:/usr/hdp/current/storm-nimbus/contrib/storm-jmxetric/lib/jmxetric-1.0.4.jar=host=localhost,port=8649,wireformat31x=true,mode=multicast,config=/usr/hdp/current/storm-nimbus/contrib/storm-jmxetric/conf/jmxetric-conf.xml:/etc/hadoop/conf/hdfs-site.xml:/etc/hadoop/conf/core-site.xml:/etc/hbase/conf/hbase-site.xml,process=Nimbus_JVM


## kerberos_installation_on_hdp_centos7.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                rajkrrsingh
                / kerberos_installation_on_hdp_centos7.md
            
            
              Last active
              May 21, 2017 14:25
            
              
                sample steps to setup kdc before installing kerberos through ambari on hortonworks cluster
              
          
    ENV

#### OS centos7
#### REALM EXAMPLE.COM (update accordingly)
#### AS and KDC are running on hostname rks253secure.hdp.local (update accordingly)

install required packages

yum install -y krb5-server krb5-workstation pam_krb5
cd  /var/kerberos/krb5kdc


## Spark LLAP Setup for Thrift server.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                rajkrrsingh
                / Spark LLAP Setup for Thrift server.md
            
            
              Last active
              May 27, 2017 15:51
            
              
                configuration required to setup Spark-LLAP
              
          
    ENV HDP-2.6.0.3-8

Download spark-llap assembly jar from http://repo.hortonworks.com/content/repositories/releases/com/hortonworks/spark-llap/
Add following in Custom spark-thrift-sparkconf

spark_thrift_cmd_opts --jars /usr/hdp/current/spark-client/lib/spark-llap-1.0.0.2.6.0.3-8-assembly.jar
spark.executor.extraClassPath /usr/hdp/current/spark-client/lib/spark-llap-1.0.0.2.6.0.3-8-assembly.jar
spark.hadoop.hive.llap.daemon.service.hosts @llap0
spark.jars /usr/hdp/current/spark-client/lib/spark-llap-1.0.0.2.6.0.3-8-assembly.jar


## kafka-python-producer
yum install -y python-pip
pip install kafka-python

//kafka producer sample code
vim kafka_producer.py
from kafka import KafkaProducer
from kafka.errors import KafkaError

producer = KafkaProducer(bootstrap_servers=['rkk1.hdp.local:6667'])
topic = "kafkatopic"

## KafkaProducerWithCallBack
import com.google.common.io.Resources;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;
import org.apache.log4j.Logger;

import java.io.IOException;
import java.io.InputStream;
import java.util.Properties;

## Druid_installation_HDP_2.6.2.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                rajkrrsingh
                / Druid_installation_HDP_2.6.2.md
            
            
              Last active
              October 26, 2017 11:09
            
              
                steps to install druid on hdp 2.6.2 over cento7
              
          
    during installatioin I hit some issues where installtion is failing with variour reasons, I have document the some of the hurdle I faced and
how I can overcome of those.
there is some issue with superset installtion when you select the storage as sqllite so please select superset storage mysql or postgress in Ambari Installation wizard.
Issue1: Requires: openblas-devel

Druid Broker Installation failed with following exception:
resource_management.core.exceptions.ExecutionFailed: Execution of '/usr/bin/yum -d 0 -e 0 -y install superset_2_6_2_0_205' returned 1. Error: Package: superset_2_6_2_0_205-0.15.0.2.6.2.0-205.x86_64 (HDP-2.6)
           Requires: openblas-devel
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest

  
## SparkDFJoinUsingBroadcast
// Hive Tables
hive> select * from customer;
OK
1	Ramesh	32	Ahmedabad	000
2	Khilan	25	Delhi	1500
3	kaushik	23	Kota	2000
4	Chaitali	25	Mumbai	6500
5	Hardik	27	Bhopal	8500
6	Komal	22	MP	4500
Time taken: 0.568 seconds, Fetched: 6 row(s)

## Custom_UDF_with_LLAP.md

      
              1 file
            
          
              1 fork
            
          
              0 comments
            
          
              0 stars
            
          
                rajkrrsingh
                / Custom_UDF_with_LLAP.md
            
            
              Last active
              January 25, 2018 20:44
            
              
                steps to add custom udf to LLAP
              
          
    Creating and running Temporary functions are discouraged while running query on LLAP because of security reason, since many users are
sharing same instances of LLAP, it can create a conflict but still you can create temp functions using add jar and hive.llap.execution.mode=auto.
with exculsive llap execution mode(hive.llap.execution.mode=only) you will run into the ClassNotFoundException, hive.llap.execution.mode=auto will allow
some part of query(map tasks) to run in the tez container.
Here are steps to create custom permanent funtion in LLAP(steps are tested on HDP-260)

create a jar for UDF funtion (in this case I am using simple udf):

git clone https://github.com/rajkrrsingh/SampleCode


## Tez-Split-Calculation.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                rajkrrsingh
                / Tez-Split-Calculation.md
            
            
              Last active
              January 31, 2018 17:48
            
              
                how tez initial paralleism work (split calculation)
              
          
    split generation in tez
2017-02-16 15:56:48,725 [INFO] [InputInitializer {Map 1} #0] |dag.RootInputInitializerManager|: Starting InputInitializer for Input: sample_07 on vertex vertex_1486830296338_0025_1_00 [Map 1]

invoke
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator#initialize
2017-02-16 15:56:48,729 [INFO] [InputInitializer {Map 1} #0] |tez.HiveSplitGenerator|: InputInitializer {Map 1} #0 | initialize realInputFormatName : org.apache.hadoop.hive.ql.io.HiveInputFormat

2017-02-16 15:56:48,738 [INFO] [InputInitializer {Map 1} #0] |tez.HiveSplitGenerator|: InputInitializer {Map 1} #0 | initialize inputFormat org.apache.hadoop.hive.ql.io.HiveInputFormat@293c29b7
	mkdir kafkaproducerscala
	cd kafkaproducerscala/
	mkdir -p src/main/scala
	cd src/main/scala
	vim KafkaProducerScala.scala

	object KafkaProducerScala extends App {

	import java.util.Properties
	yum install -y python-pip
	pip install kafka-python

	//kafka producer sample code
	vim kafka_producer.py
	from kafka import KafkaProducer
	from kafka.errors import KafkaError

	producer = KafkaProducer(bootstrap_servers=['rkk1.hdp.local:6667'])
	topic = "kafkatopic"
	import com.google.common.io.Resources;
	import org.apache.kafka.clients.producer.KafkaProducer;
	import org.apache.kafka.clients.producer.ProducerRecord;
	import org.apache.kafka.clients.producer.RecordMetadata;
	import org.apache.log4j.Logger;

	import java.io.IOException;
	import java.io.InputStream;
	import java.util.Properties;
	// Hive Tables
	hive> select * from customer;
	OK
	1 Ramesh 32 Ahmedabad 000
	2 Khilan 25 Delhi 1500
	3 kaushik 23 Kota 2000
	4 Chaitali 25 Mumbai 6500
	5 Hardik 27 Bhopal 8500
	6 Komal 22 MP 4500
	Time taken: 0.568 seconds, Fetched: 6 row(s)