Skip to content

Instantly share code, notes, and snippets.

@jamesrajendran
Created May 7, 2017 04:56
Show Gist options
  • Save jamesrajendran/be0bab5ddb2fda72515d2804e9028ad1 to your computer and use it in GitHub Desktop.
Save jamesrajendran/be0bab5ddb2fda72515d2804e9028ad1 to your computer and use it in GitHub Desktop.
sudo ssh 127200813data00.eap.g4ihos.itcs.hpecorp.net
cd /usr/hdp/2.5.0.0-1245/kafka/bin/
./kafka-topics.sh --create --zookeeper 127200813master.eap.g4ihos.itcs.hpecorp.net:2181,127200813data02.eap.g4ihos.itcs.hpecorp.net:2181,127200813data01.eap.g4ihos.itcs.hpecorp.net:2181,127200813data00.eap.g4ihos.itcs.hpecorp.net:2181 --replication-factor 1 --partitions 1 --topic test
./kafka-topics.sh --list --zookeeper 127200813master.eap.g4ihos.itcs.hpecorp.net:2181,127200813data02.eap.g4ihos.itcs.hpecorp.net:2181,127200813data01.eap.g4ihos.itcs.hpecorp.net:2181,127200813data00.eap.g4ihos.itcs.hpecorp.net:2181
./kafka-console-producer.sh --broker-list 127200813data00.eap.g4ihos.itcs.hpecorp.net:9092 --topic test
./kafka-console-consumer.sh --bootstrap-server 127200813data00.eap.g4ihos.itcs.hpecorp.net:9092 --zookeeper 127200813master.eap.g4ihos.itcs.hpecorp.net:2181,127200813data02.eap.g4ihos.itcs.hpecorp.net:2181,127200813data01.eap.g4ihos.itcs.hpecorp.net:2181,127200813data00.eap.g4ihos.itcs.hpecorp.net:2181 --topic test1 --from-beginning
--------------
./kafka-topics.sh --create --zookeeper 127200813data02.eap.g4ihos.itcs.hpecorp.net:2181 --replication-factor 1 --partitions 1 --topic test1
./kafka-console-producer.sh --broker-list 127200813data00.eap.g4ihos.itcs.hpecorp.net:9092 --topic test1
./kafka-console-consumer.sh --bootstrap-server 127200813data00.eap.g4ihos.itcs.hpecorp.net:9092 --zookeeper 127200813data02.eap.g4ihos.itcs.hpecorp.net:2181 --topic test1 --from-beginning
kafka-topics.sh --describe --zookeeper 127200813data02.eap.g4ihos.itcs.hpecorp.net:2181 --topic tes1
./kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --topic test --zookeeper 127200813master.eap.g4ihos.itcs.hpecorp.net:2181,127200813data02.eap.g4ihos.itcs.hpecorp.net:2181,127200813data01.eap.g4ihos.itcs.hpecorp.net:2181,127200813data00.eap.g4ihos.itcs.hpecorp.net:2181
spark-shell jar kafka-spark-consumer-1.0.10.jar
------------------concepts--------------------
important criteria - Fault tolerance, data consistency, simpler API, lower end to end latency
Durable - exactly once - replay any message or set of messages given the neccessary selection criteria.
Reliable - at least once - able to replay an already received message - JMS, RabbitMQ, Amazon kenesis
Spark streaming - high performance!! due to RDD microbatching
apache storm supported by hortonworks and mapR only
spark streaming hss higher API, similar to spark batch, sql support
partitions - number of partitions per topic is important -- can't have more consumers than partitions or else same partition data will be consumed by more than one consumer.
Rebalancing: happens when consumers join or leave a group, when failures happen.
-----------parallelize inputDstreamsms-----------------
multiple InputDStreams
val numInputDStreams = 5
val kafkaDStreams = (1 to numInputDStreams).map { _ => KafkaUtils.createStream(...) }
--
multiple threads in one inputDstream
val consumerThreadsPerInputDstream = 3
val topics = Map("zerg.hydra" -> consumerThreadsPerInputDstream)
val stream = KafkaUtils.createStream(ssc, kafkaParams, topics, ...)
---------------kafka fraud receive data and respond synchronously??--------------
------------------reading http request parameters-------------------
protected void doGet(
HttpServletRequest request,
HttpServletResponse response)
throws ServletException, IOException {
String param1 = request.getParameter("param1");
String param2 = request.getParameter("param2");
}
------------------rebalance-----------------------------
when consumer threads change dynamically(user intiated or due to system failures)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment