Skip to content

Instantly share code, notes, and snippets.

@jkreps
Last active January 21, 2024 11:02
Show Gist options
  • Save jkreps/c7ddb4041ef62a900e6c to your computer and use it in GitHub Desktop.
Save jkreps/c7ddb4041ef62a900e6c to your computer and use it in GitHub Desktop.
Kafka Benchmark Commands
Producer
Setup
bin/kafka-topics.sh --zookeeper esv4-hcl197.grid.linkedin.com:2181 --create --topic test-rep-one --partitions 6 --replication-factor 1
bin/kafka-topics.sh --zookeeper esv4-hcl197.grid.linkedin.com:2181 --create --topic test --partitions 6 --replication-factor 3
Single thread, no replication
bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test7 50000000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196
Single-thread, async 3x replication
bin/kafktopics.sh --zookeeper esv4-hcl197.grid.linkedin.com:2181 --create --topic test --partitions 6 --replication-factor 3
bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test6 50000000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196
Single-thread, sync 3x replication
bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test 50000000 100 -1 acks=-1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=64000
Three Producers, 3x async replication
bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test 50000000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196
Throughput Versus Stored Data
bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test 50000000000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196
Effect of message size
for i in 10 100 1000 10000 100000;
do
echo ""
echo $i
bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test $((1000*1024*1024/$i)) $i -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=128000
done;
Consumer
Consumer throughput
bin/kafka-consumer-perf-test.sh --zookeeper esv4-hcl197.grid.linkedin.com:2181 --messages 50000000 --topic test --threads 1
3 Consumers
On three servers, run:
bin/kafka-consumer-perf-test.sh --zookeeper esv4-hcl197.grid.linkedin.com:2181 --messages 50000000 --topic test --threads 1
End-to-end Latency
bin/kafka-run-class.sh kafka.tools.TestEndToEndLatency esv4-hcl198.grid.linkedin.com:9092 esv4-hcl197.grid.linkedin.com:2181 test 5000
Producer and consumer
bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test 50000000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196
bin/kafka-consumer-perf-test.sh --zookeeper esv4-hcl197.grid.linkedin.com:2181 --messages 50000000 --topic test --threads 1
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# see kafka.server.KafkaConfig for additional details and defaults
############################# Server Basics #############################
# The id of the broker. This must be set to a unique integer for each broker.
broker.id=0
############################# Socket Server Settings #############################
# The port the socket server listens on
port=9092
# Hostname the broker will bind to and advertise to producers and consumers.
# If not set, the server will bind to all interfaces and advertise the value returned from
# from java.net.InetAddress.getCanonicalHostName().
#host.name=localhost
# The number of threads handling network requests
num.network.threads=4
# The number of threads doing disk I/O
num.io.threads=8
# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=1048576
# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=1048576
# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600
############################# Log Basics #############################
# The directory under which to store log files
log.dirs=/grid/a/dfs-data/kafka-logs,/grid/b/dfs-data/kafka-logs,/grid/c/dfs-data/kafka-logs,/grid/d/dfs-data/kafka-logs,/grid/e/dfs-data/kafka-logs,/grid/f/dfs-data/kafka-logs
# The number of logical partitions per topic per server. More partitions allow greater parallelism
# for consumption, but also mean more files.
num.partitions=8
############################# Log Flush Policy #############################
# The following configurations control the flush of data to disk. This is the most
# important performance knob in kafka.
# There are a few important trade-offs here:
# 1. Durability: Unflushed data is at greater risk of loss in the event of a crash.
# 2. Latency: Data is not made available to consumers until it is flushed (which adds latency).
# 3. Throughput: The flush is generally the most expensive operation.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.
# Per-topic overrides for log.flush.interval.ms
#log.flush.intervals.ms.per.topic=topic1:1000, topic2:3000
############################# Log Retention Policy #############################
# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.
# The minimum age of a log file to be eligible for deletion
log.retention.hours=168
# A size-based retention policy for logs. Segments are pruned from the log as long as the remaining
# segments don't drop below log.retention.bytes.
#log.retention.bytes=1073741824
# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=536870912
# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.cleanup.interval.mins=1
############################# Zookeeper #############################
# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=esv4-hcl197.grid.linkedin.com:2181
# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=1000000
# metrics reporter properties
kafka.metrics.polling.interval.secs=5
kafka.metrics.reporters=kafka.metrics.KafkaCSVMetricsReporter
kafka.csv.metrics.dir=/tmp/kafka_metrics
# Disable csv reporting by default.
kafka.csv.metrics.reporter.enabled=false
replica.lag.max.messages=10000000
@jianghaitao
Copy link

It would be nice that topics created in the setup: test and test-rep-one can be explained as how they are used in the later tests. This is not obvious. Also, how the batch size 8K v.s 60K was chosen. Finally on line 21, how do we know 3 threads were used?

@ssinganamalla
Copy link

You could run the benchmark tests out of the box without building it first in 0.9. Note the difference in the package name for ProducerPerformance

bin/kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic first --num-records 50000000 --record-size 100 --throughput -1 --producer-props acks=1 bootstrap.servers=localhost:9092 buffer.memory=67108864 batch.size=8196

@glory5
Copy link

glory5 commented Apr 1, 2016

@ssinganamalla will you plz provide mi script for local machine bechmark test?

@ToniYang
Copy link

ToniYang commented Jun 14, 2016

kafka version is lastest
Error: Could not find or load main class test7

mybe is the test file can not work in lastest version kafka.
Does the file need update?

@gaurav6351
Copy link

gaurav6351 commented Nov 15, 2016

can we pass file instead of message size for producer perf test , which have my real time data.?

@dskarbek
Copy link

Would be great to have an updated version of this for latest version of Kafka. I've got kafka_2.11-0.10.20.0, and bin/kafka-run-class.sh kafka.tools.TestEndToEndLatency can't find the class. Is it renamed? Do I have to do something to get it? A previous comment mentioned running gradlew from the top directory, but that is not present either. I see the consumer and producer performance test scripts in the bin directory, but I want to run the end to end performance test and there doesn't seem to be a script for that.

@csaroff
Copy link

csaroff commented Oct 6, 2017

After what seems like way too long I figured it out. Looks like it's actually named EndToEndLatency. The above command should be

bin/kafka-run-class.sh kafka.tools.EndToEndLatency

in the root of the project. Strangely, neither command worked for me in kafka 0.8.1. I kept getting the "Could not find or load main class" error.

Just in case anyone else is having trouble running the latency tests, try version 0.11.0.1(kafka_2.11-0.11.0.1.tgz) and use class EndToEndLatency instead of TestEndToEndLatency.

@Hallomann93
Copy link

Where do I find the source of the consumer test? In the kafka-consumer-perf-test.sh the class is kafka.tools.ConsumerPerformance I don't find this one in the sources, only as a compiled .class file. I would like to look at the insides, but can't locate the .java, it isn't in the same package as the org.apache.kafka.tools.ProducerPerformance.

@Jane870103
Copy link

how can we change the number of producers?

@zhijiay
Copy link

zhijiay commented Jan 26, 2018

"how can we change the number of producers?"

just run it in three diff machine.

Detail in this blog: https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines

@nahdukesaba
Copy link

@jkreps could you please update the code using the latest kafka released?

@zodvik
Copy link

zodvik commented Mar 23, 2018

@kirthirk
Copy link

kirthirk commented Mar 14, 2019

I am not able to connect to esv4-hcl197.grid.linkedin.com:2181, can someone please help me with that?
FATAL Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) \u2502
org.I0Itec.zkclient.exception.ZkException: Unable to connect to esv4-hcl197.grid.linkedin.com:2181

@avilashs
Copy link

What is the advantage of using zookeeper in different machine.

@hanrell
Copy link

hanrell commented Sep 11, 2019

What is the advantage of using zookeeper in different machine?

@arlixu
Copy link

arlixu commented Oct 22, 2019

What is the advantage of using zookeeper in different machine?
I used my zk in the same machine with kafka, although I'm afraid it's a dangerous decision.
I did this because I could save three machines.
some key point is:
1.kafka data transferring on the NIC will make zk connection timeout due to hit the limits of NIC.
2.if zk use the same data disk with kafka,zk will have a IO blocking while kafka busy reading and writing.
so when I use zk within the three machines same with kafka. I set zk data dir to an independent disk. such as os disk. usually ssd.

@daixiang0
Copy link

daixiang0 commented Jan 8, 2020

For recent version(test with 2.3.0):

  1. clone kafka source code, then run
./gradlew jarAll -x signArchives -x test -x javadoc -x scaladoc
  1. run test
 bin/kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic test-rep-one --num-records 50000000 --record-size 100 --throughput=-1 --producer.config ./test.conf 

test.conf:

[root@dx-app2 kafka]# cat test.conf 

# list of brokers used for bootstrapping knowledge about the rest of the cluster
# format: host1:port1,host2:port2 ...
bootstrap.servers=localhost:9092

# the default batch size in bytes when batching multiple records sent to a partition
batch.size=8196

# the total bytes of memory the producer can use to buffer records waiting to be sent to the server
buffer.memory=67108864

@Vera-hy
Copy link

Vera-hy commented Mar 5, 2020

@daixiang0
Hi, I can't understand the first step. What do you mean by saying "clone kafka source code"?

@Java4all
Copy link

Java4all commented Mar 6, 2020

I think the answer is simple, for example, run 'git clone ...location' command where ...location is Kafka source on github.

@jsrz
Copy link

jsrz commented Aug 6, 2020

Is this test setup running zookeeper on 3 nodes or just 1? It's not clear from this file.

@Avi-bot-art
Copy link

I believe the zookeeper switch on 'kafka-consumer-perf-test' has deprecated for 'kafka-consumer-perf-test.bat' . (I am running remotely from kafka path CLI)
Running the below from CLI I am getting the error.

kafka-consumer-perf-test --topic test --bootstrap-server test:XXXX --messages 10 --threads 1 --consumer.config C:*****\consumer.properties --group test --timeout 100000 --print-metrics

Exception in thread "main" java.util.IllegalFormatConversionException: f != java.lang.Integer
at java.util.Formatter$FormatSpecifier.failConversion(Formatter.java:4302)
at java.util.Formatter$FormatSpecifier.printFloat(Formatter.java:2806)
at java.util.Formatter$FormatSpecifier.print(Formatter.java:2753)
at java.util.Formatter.format(Formatter.java:2520)
at java.util.Formatter.format(Formatter.java:2455)
at java.lang.String.format(String.java:2940)
at scala.collection.immutable.StringLike.format(StringLike.scala:354)
at scala.collection.immutable.StringLike.format$(StringLike.scala:353)
at scala.collection.immutable.StringOps.format(StringOps.scala:33)
at kafka.utils.ToolsUtils$.$anonfun$printMetrics$3(ToolsUtils.scala:60)
at kafka.utils.ToolsUtils$.$anonfun$printMetrics$3$adapted(ToolsUtils.scala:58)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at kafka.utils.ToolsUtils$.printMetrics(ToolsUtils.scala:58)
at kafka.tools.ConsumerPerformance$.main(ConsumerPerformance.scala:82)
at kafka.tools.ConsumerPerformance.main(ConsumerPerformance.scala)

@happyfeetw
Copy link

happyfeetw commented Oct 14, 2020

@daixiang0
Hi, I can't understand the first step. What do you mean by saying "clone kafka source code"?

I think to clone source code is to build Kafka from source. If you've already installed, just go run the shell script. :)

@le-cardinal
Copy link

le-cardinal commented Mar 6, 2023

Hello

Single-thread, sync 3x replication

bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test 50000000 100 -1 acks=-1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=64000

Three Producers, 3x async replication
bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test 50000000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196

Sorry but what is the difference between sync and async replication ? i was thinking synch mode mode corresponds to acks=1 and async correspond to acks=all but the command lines let me think something different ?

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment