giri-sh/kafka_fundamentals_notes

## kafka_fundamentals_notes
What is this gist about?
-- Apache Kafka Fundamentals


Why do we need Apache Kafka?
-- Every day data produced in the world is huge. Currently estimated at 2.5 QB (Quintillion Bytes).
-- Because of the huge data that we generate every day, we need some kind of Queuing theory that can process this data for our systems.


Types of Queuing Systems?
-- P2P
-- Publisher-Subscriber


What is Apache Kafka?
-- Kafka is a distributed, reliable and performant streaming platform.
-- Kafka works on Publisher-Subscriber model.
-- Kafka has the capability of handling the continuous stream of data.
-- Kafka supports the transfer of huge data or requests between systems.
-- Kafka stores data that is published and consumed.


What is Zookeeper?
-- Cluster management system for Kafka.
-- Also acts as an orchestrator for Kafka.
-- Zookeeper ensemble
-- Zookeeper is needed to  -
---- Elect topic leader
---- Resolve deadlock issues


Kafka Cluster - Collection of brokers.
Broker - Independent instance of Kafka service. Each broker runs in its own VM. It is also known as a Bootstrap server.
Topic - Is based on commit log architecture. Mulitple topics can be created in a Kafka cluster.
Partitions - Topics are divided into partitions. These are created for improving parallel processing. Messages are stored in partitions with incremental offset.


What are the guarantees that Kafka provides?
-- Ordering is confirmed in a partition.
-- The default time for which the data is stored is 7 days. This is customizable.
--
-- Partition to broker assignment is automatic


What is a Producer?
-- System that generates the data.
-- Uses API to write data to a Kafka cluster.
-- Uses keys to send the data.


What are Acknowledgements?
-- Response that the producer awaits for to confirm that that data produced has been safely stored in Kafka cluster.
-- 3 modes of acknowledgements.
---- 0 - Fire and forget.
---- 1 - Get ackowledgements from leader.
---- All - Get acknowledgement from all (leaders and ISR)


What is a consumer?
-- System that reads data from topics.
-- Multiple consumer groups can consume a particular topic or a partition.


What are consumer groups?
-- Group of consumers that is created to achieve a common goal.
-- Goal can be - to save the data to DB or perform an alerting operation.


What are Delivery Semantics?
-- Process by which consumers mark the message in a topic as consumed.
-- 3 modes of delivery semantics -


Replication Factor -
-- Helps preserve the number of copies of topic in a cluster.


Offset - Incremental integer ID assigned to a broker
	What is this gist about?
	-- Apache Kafka Fundamentals


	Why do we need Apache Kafka?
	-- Every day data produced in the world is huge. Currently estimated at 2.5 QB (Quintillion Bytes).
	-- Because of the huge data that we generate every day, we need some kind of Queuing theory that can process this data for our systems.


	Types of Queuing Systems?
	-- P2P
	-- Publisher-Subscriber


	What is Apache Kafka?
	-- Kafka is a distributed, reliable and performant streaming platform.
	-- Kafka works on Publisher-Subscriber model.
	-- Kafka has the capability of handling the continuous stream of data.
	-- Kafka supports the transfer of huge data or requests between systems.
	-- Kafka stores data that is published and consumed.


	What is Zookeeper?
	-- Cluster management system for Kafka.
	-- Also acts as an orchestrator for Kafka.
	-- Zookeeper ensemble
	-- Zookeeper is needed to -
	---- Elect topic leader
	---- Resolve deadlock issues


	Kafka Cluster - Collection of brokers.
	Broker - Independent instance of Kafka service. Each broker runs in its own VM. It is also known as a Bootstrap server.
	Topic - Is based on commit log architecture. Mulitple topics can be created in a Kafka cluster.
	Partitions - Topics are divided into partitions. These are created for improving parallel processing. Messages are stored in partitions with incremental offset.


	What are the guarantees that Kafka provides?
	-- Ordering is confirmed in a partition.
	-- The default time for which the data is stored is 7 days. This is customizable.
	--
	-- Partition to broker assignment is automatic


	What is a Producer?
	-- System that generates the data.
	-- Uses API to write data to a Kafka cluster.
	-- Uses keys to send the data.


	What are Acknowledgements?
	-- Response that the producer awaits for to confirm that that data produced has been safely stored in Kafka cluster.
	-- 3 modes of acknowledgements.
	---- 0 - Fire and forget.
	---- 1 - Get ackowledgements from leader.
	---- All - Get acknowledgement from all (leaders and ISR)


	What is a consumer?
	-- System that reads data from topics.
	-- Multiple consumer groups can consume a particular topic or a partition.


	What are consumer groups?
	-- Group of consumers that is created to achieve a common goal.
	-- Goal can be - to save the data to DB or perform an alerting operation.


	What are Delivery Semantics?
	-- Process by which consumers mark the message in a topic as consumed.
	-- 3 modes of delivery semantics -


	Replication Factor -
	-- Helps preserve the number of copies of topic in a cluster.


	Offset - Incremental integer ID assigned to a broker