zookeeper.properties
dataDir={ new_data_dir_instead_of_tmp }/data/zookeeper
server.properties
log.dirs={ new_log_dir_instead_of_tmp }/data/kafka/logs
num.partitions=3 // change this to set default partitions numbers when create topic
producer retry:
retry.backoff.ms = 100ms // default setting to retry a send every 100ms
delivery.timeout.ms = 120 000 ms == 2 minuts // after 2 min, the producer will not retry the failed message anymore
idempotent producers (safe producer)
enable.idempotence=true (producer level) + min.insync.replicas=2 (broker/topic level)
the above implies: acks=all, retires=MAX_INT, max.in.flight.request.per.connection=5
message compression at producer level don't require any configuration change in the brokers or in the consumers
compression.type = 'none' | 'gzip' | 'lz4' | 'snappy' # snappy or lz4 could be a better choice for optimal speed / compression ratio
linger.ms and batch.size at producer level
linger.ms=5 # increase the chances of messages being sent together in a btach
batch.size=16kb | 32kb | 64kb # too big is a memory waste
idempotent consumer (at least once) at consumer level
# kafka generic id
id = record.topic() + "_" + record.partition() + "_" + record.offset();
# or pass uniq id in message
id = message.id
enable.auto.commit = false & mannual commit offset
consumer offset behaviour
offset.retention.minutes # default is 7 days otherwise the offset of the consumer will be lost, make sure set this to a high value
data retention peiod # ? set to a high value, default is 7 days
log cleanup policy: delete
log.retention.hours=168 # default is 1 week
log.retention.bytes=-1
or
log.retention.hours=17520 # set it very high to use size limit
log.retention.bytes=524288000 # 500MB