- Install Kafka using HomeBrew
$ brew install kafka
- Start ZooKeeper
$ zookeeper-server-start /usr/local/etc/kafka/zookeeper.properties
- Start Kafka
$ kafka-server-start /usr/local/etc/kafka/server.properties
$ kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
$ kafka-console-producer --broker-list localhost:9092 --topic test
You'll see something like this
kafka-console-producer --broker-list localhost:9092 --topic test
>1
>2
>3
>4
Each line you type (here, I typed just numbers in sequence) and press enter will be put on to Kafka topic test
$ kafka-console-consumer --bootstrap-server localhost:9092 --topic test --from-beginning
You'll see something like below
$ kafka-console-consumer --bootstrap-server localhost:9092 --topic test --from-beginning
1
2
3
4
- Install Confluent Python client
pip install confluent-kafka
Single unit of data that can be sent or received.
For Kafka any message is just byte array. We can provide additional structure to the message by defining the schema. Few available options are XML, JSON, Avro, Protobuf, etc.
External process or application which produces the message
External process or application which consumes the messages
Topic provides a way to categorizing the data that is being sent.
Topic can be further broken down in partitions. Each partition acts as a separate commit log. Order of messages is guaranteed only across the same partition. If no partition is specified and there are multiple partitions in given topic messages willb e written in round-robin fashion.
Producing and delivering single messages to Kafka creates lot of overhead. Batch iis a collection of messages produced for same topic and partition. Can be compressed.
Consumer group consumes a message from a topic. A consumer from consumer group reads a message from a partition within topic. If you want to process same message by two consumers, you need to create separate consumer group.
Kafka by default provides retention of messages based on number of days or size. By default, 7 day is the retention period. TODO: Default size for retention?
A replica is in-sync if it is not more than N messages or T seconds (default 10) behind the leader.
A single Kafka server within a cluster is called a Broker
- Receiving messages from producers
- Assigning offsets
- Committing messages to disk
- Responding to consumers fetch requests and serving messages
One broker from a cluster would act as a "cluster controller". TODO: Cluster controller failure??
- Assigning partition to brokers
- Monitoring for broker failures
If two messages are produced by same producer for same partition then Kafka guarantees the order of those two messages.
- Kafka Connect
- Kafka Connectors
- Kafka Streams
- Use Kafka connector
- Introduction to Apache Kafka by James Ward
- Installation Steps
- Good intro to Kafka
- Apache Kafka Explained
(Comprehensive Overview) *