Skip to content

Instantly share code, notes, and snippets.

@akiatoji
Last active July 17, 2022 17:06
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save akiatoji/1cd7519f2d45e182ac9889fe01ca5f1d to your computer and use it in GitHub Desktop.
Save akiatoji/1cd7519f2d45e182ac9889fe01ca5f1d to your computer and use it in GitHub Desktop.

Dev setup for Kafka and Spark Streaming

This note is for setting up rapid prototyping of MongoDB -> Kafka -> Spark Streaming pipeline.

Pyspark is being used for fast turn around. Pyspark uses Kafka connector that uses ZooKeeper so the topic has to be created with offsets stored in ZooKeeper.

Prerequisites

$> brew install zookeeper kafka apache-spark jupyter
$> brew services start zookeeper
$> brew services start kafka

Create Kafka topic

$> kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 5 --topic checklist

Check that the partition is created.

$> kafka-topics --describe --zookeeper localhost:2181 --topic checklist 

Topic:checklist	PartitionCount:5	ReplicationFactor:1	Configs:
	Topic: checklist	Partition: 0	Leader: 0	Replicas: 0	Isr: 0
	Topic: checklist	Partition: 1	Leader: 0	Replicas: 0	Isr: 0
	Topic: checklist	Partition: 2	Leader: 0	Replicas: 0	Isr: 0
	Topic: checklist	Partition: 3	Leader: 0	Replicas: 0	Isr: 0
	Topic: checklist	Partition: 4	Leader: 0	Replicas: 0	Isr: 0

Status of consumer group

Once Spark streaming code is run, it creates a new consumer group.

$> kafka-consumer-groups  --zookeeper localhost:2181 --describe --group checklist-group

Note: This will only show information about consumers that use ZooKeeper (not those using the Java consumer API).


TOPIC                          PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG        CONSUMER-ID
checklist                 0          25094           25094           0          checklist-group_Rydeen.local-1517642255383-8ec43f30
checklist                 1          25387           25387           0          checklist-group_Rydeen.local-1517642255383-8ec43f30
checklist                 2          24989           24989           0          checklist-group_Rydeen.local-1517642255383-8ec43f30
checklist                 3          24887           24887           0          checklist-group_Rydeen.local-1517642255383-8ec43f30
checklist                 4          25138           25138           0          checklist-group_Rydeen.local-1517642255383-8ec43f30

Reset consumer offset

kafka-consumer-groups --bootstrap-server localhost:9092 --reset-offsets --group checklist-group --all-topics --to-earliest

Purging kafka topics

kafka-configs --zookeeper localhost:2181  --alter --entity-type topics --add-config retention.ms=1000  --entity-name checklist

kafka-configs --zookeeper localhost:2181  --alter --entity-type topics --delete-config retention.ms  --entity-name checklist
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment