Skip to content

Instantly share code, notes, and snippets.

@fabsta
Created October 8, 2019 19:15
Show Gist options
  • Save fabsta/487a04056dc33299fb4d14c5ce6a45fc to your computer and use it in GitHub Desktop.
Save fabsta/487a04056dc33299fb4d14c5ce6a45fc to your computer and use it in GitHub Desktop.
Kafka best practice

Kafka Best practices

Components - Producers

FAQ overview

Config most useful options are:

  • producer.type (default sync)
  • request.required.acks (default 0)
  • compression.codec (default none)
  • batch.num.messages (default 200)

all: http://kafka.apache.org/documentation.html#producerconfigs

Component - Consumers

Excerpt from FAQs

  • Should I choose multiple group ids or a single one for the consumers?
  • Can I predict the results of the consumer rebalance?
  • How can I rewind the offset in the consumer?
  • How to improve the throughput of a remote consumer?
  • I don't want my consumer's offsets to be committed automatically. Can I manually manage my consumer's offsets?
  • What is the relationship between fetch.wait.max.ms and socket.timeout.ms on the consumer?
  • How do I get exactly-once messaging from Kafka?
  • Why can't I specify the number of streams parallelism per topic map using wildcard stream as I use static stream handler?
  • How to consume large messages?
  • How do we migrate to committing offsets to Kafka (rather than Zookeeper) in 0.8.2?

configs options

most useful are:

  • broker.id: each broker needs unique id

  • logs.dir: log data store (/tmp/kafka-logs)

  • zookeeper.connect: zookeeper connection string

  • host.name: broker hostname

  • num.partitions: default = 1

  • auto.create.topics.enable: default = true

  • default.replication.factor: default = true

Components - Brokers

Excerpt from FAQs

  • How does Kafka depend on Zookeeper?
  • How many topics can I have?
  • How to reduce churns in ISR? When does a broker leave the ISR ?
  • How do I accurately get offsets of messages for a certain timestamp using OffsetRequest?
  • Can I add new brokers dynamically to a cluster?
  • How do I choose the number of partitions for a topic?

Troubleshooting

Common cause

[consumer lag](http://www.slideshare.net/miguno/apache-kafka-08-basic-training-verisign slide 50)

  • is a consumer problem, caused by
  • too slow, too much GC, losing connection to ZK
  • bug or design flaw
  • operational mistake

Brokers

FAQs from https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Brokers.

  • Why do I see error "Should not set log end offset on partition" in the broker log?
  • Why does controlled shutdown fail?
  • Why can't my consumers/producers connect to the brokers?
  • How to replace a failed broker?
  • After bouncing a broker, why do I see LeaderNotAvailable or NotLeaderForPartition exceptions on startup?
  • Why do I see lots of Leader not local exceptions on the broker during controlled shutdown?
  • Why partition leaders migrate themselves some times?

Producers

FAQs from https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Producers

  • I am using the ZK-based producer in 0.7 and I see data only produced on some of the brokers, but not all, why?
  • Why are my brokers not receiving producer sent messages?
  • Why is data not evenly distributed among partitions when a partitioning key is not specified?
  • Why do I get QueueFullException in my producer when running in async mode?

Consumers

FAQS via https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Consumers

  • Why does my consumer never get any data?
  • Why does my consumer get InvalidMessageSizeException?
  • Why some of the consumers in a consumer group never receive any message?
  • Why are there many rebalances in my consumer log?
  • My consumer seems to have stopped, why?
  • Why messages are delayed in my consumer?

Monitoring tools

To e.g. check consumer offset, dump logs, export zookeeper offsets (link):

Visualising using Kibana: https://www.elastic.co/blog/logstash-kafka-intro

Auditing kafka: http://www.slideshare.net/miguno/apache-kafka-08-basic-training-verisign slide 55

every producer writes to "stats" topic how many messages produced

##Topics

###Partition

  • keep time-related information in same partition

##Operations

https://kafka.apache.org/documentation.html#operations covers

  • mirroring data btw. clusters
  • graceful shutdown
  • balancing leadership
  • expanding cluster
  • increasing replication factor

Testing

Integration testing

There is a section here: https://github.com/miguno/kafka-storm-starter

Performance tuning

General

  • often not much tuning required!
  • maybe
  • num.io.threads
  • should be >=#disks (start with == #disks)
  • num.network.threads
  • adjust based on #producers,#consumers + replication factor

Kernel tuning

  • lots of RAM
  • gives huge page cache & avoid disk I/O

Disk throughput

  • longer commit intervals
  • increase flush time (Linkedin 120s, default 30s)

Java/JVM tuning

  • minimize GC pauses
  • use of Oracle JDK solves some problems
  • it uses new G1 garbage-first collector

References

User guide

http://www.confluent.io/blog/stream-data-platform-1/ http://www.confluent.io/blog/stream-data-platform-2/

extensive best practice guide for Storm & Kafka

https://community.hortonworks.com/articles/550/unofficial-storm-and-kafka-best-practices-guide.html

FAQs

https://cwiki.apache.org/confluence/display/KAFKA/FAQ

Kafka ecosystem

connectors & hadoop integration: https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem

Written with StackEdit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment