akiatoji/Running Kafka SparkStreaming on macOS.md

## Running Kafka SparkStreaming on macOS.md

      
    Raw
  

              Running Kafka SparkStreaming on macOS.md
            
          
    Dev setup for Kafka and Spark Streaming

This note is for setting up rapid prototyping of MongoDB -> Kafka -> Spark Streaming pipeline.
Pyspark is being used for fast turn around.   Pyspark uses Kafka connector that uses ZooKeeper so the topic has to be created with offsets stored in ZooKeeper.
Prerequisites

$> brew install zookeeper kafka apache-spark jupyter
$> brew services start zookeeper
$> brew services start kafka

Create Kafka topic

$> kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 5 --topic checklist

Check that the partition is created.
$> kafka-topics --describe --zookeeper localhost:2181 --topic checklist 

Topic:checklist	PartitionCount:5	ReplicationFactor:1	Configs:
	Topic: checklist	Partition: 0	Leader: 0	Replicas: 0	Isr: 0
	Topic: checklist	Partition: 1	Leader: 0	Replicas: 0	Isr: 0
	Topic: checklist	Partition: 2	Leader: 0	Replicas: 0	Isr: 0
	Topic: checklist	Partition: 3	Leader: 0	Replicas: 0	Isr: 0
	Topic: checklist	Partition: 4	Leader: 0	Replicas: 0	Isr: 0

Status of consumer group

Once Spark streaming code is run, it creates a new consumer group.
$> kafka-consumer-groups  --zookeeper localhost:2181 --describe --group checklist-group

Note: This will only show information about consumers that use ZooKeeper (not those using the Java consumer API).


TOPIC                          PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG        CONSUMER-ID
checklist                 0          25094           25094           0          checklist-group_Rydeen.local-1517642255383-8ec43f30
checklist                 1          25387           25387           0          checklist-group_Rydeen.local-1517642255383-8ec43f30
checklist                 2          24989           24989           0          checklist-group_Rydeen.local-1517642255383-8ec43f30
checklist                 3          24887           24887           0          checklist-group_Rydeen.local-1517642255383-8ec43f30
checklist                 4          25138           25138           0          checklist-group_Rydeen.local-1517642255383-8ec43f30

Reset consumer offset

kafka-consumer-groups --bootstrap-server localhost:9092 --reset-offsets --group checklist-group --all-topics --to-earliest

Purging kafka topics

kafka-configs --zookeeper localhost:2181  --alter --entity-type topics --add-config retention.ms=1000  --entity-name checklist

kafka-configs --zookeeper localhost:2181  --alter --entity-type topics --delete-config retention.ms  --entity-name checklist