Skip to content

Instantly share code, notes, and snippets.

@giri-sh
Last active July 22, 2021 11:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save giri-sh/bb5c7c9bf781bd743a965bedf67b539d to your computer and use it in GitHub Desktop.
Save giri-sh/bb5c7c9bf781bd743a965bedf67b539d to your computer and use it in GitHub Desktop.
Kafka Fundamentals Notes
What is this gist about?
-- Apache Kafka Fundamentals
Why do we need Apache Kafka?
-- Every day data produced in the world is huge. Currently estimated at 2.5 QB (Quintillion Bytes).
-- Because of the huge data that we generate every day, we need some kind of Queuing theory that can process this data for our systems.
Types of Queuing Systems?
-- P2P
-- Publisher-Subscriber
What is Apache Kafka?
-- Kafka is a distributed, reliable and performant streaming platform.
-- Kafka works on Publisher-Subscriber model.
-- Kafka has the capability of handling the continuous stream of data.
-- Kafka supports the transfer of huge data or requests between systems.
-- Kafka stores data that is published and consumed.
What is Zookeeper?
-- Cluster management system for Kafka.
-- Also acts as an orchestrator for Kafka.
-- Zookeeper ensemble
-- Zookeeper is needed to -
---- Elect topic leader
---- Resolve deadlock issues
Kafka Cluster - Collection of brokers.
Broker - Independent instance of Kafka service. Each broker runs in its own VM. It is also known as a Bootstrap server.
Topic - Is based on commit log architecture. Mulitple topics can be created in a Kafka cluster.
Partitions - Topics are divided into partitions. These are created for improving parallel processing. Messages are stored in partitions with incremental offset.
What are the guarantees that Kafka provides?
-- Ordering is confirmed in a partition.
-- The default time for which the data is stored is 7 days. This is customizable.
--
-- Partition to broker assignment is automatic
What is a Producer?
-- System that generates the data.
-- Uses API to write data to a Kafka cluster.
-- Uses keys to send the data.
What are Acknowledgements?
-- Response that the producer awaits for to confirm that that data produced has been safely stored in Kafka cluster.
-- 3 modes of acknowledgements.
---- 0 - Fire and forget.
---- 1 - Get ackowledgements from leader.
---- All - Get acknowledgement from all (leaders and ISR)
What is a consumer?
-- System that reads data from topics.
-- Multiple consumer groups can consume a particular topic or a partition.
What are consumer groups?
-- Group of consumers that is created to achieve a common goal.
-- Goal can be - to save the data to DB or perform an alerting operation.
What are Delivery Semantics?
-- Process by which consumers mark the message in a topic as consumed.
-- 3 modes of delivery semantics -
Replication Factor -
-- Helps preserve the number of copies of topic in a cluster.
Offset - Incremental integer ID assigned to a broker
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment