Skip to content

Instantly share code, notes, and snippets.

View stanislavkozlovski's full-sized avatar
:shipit:
just a chill guy shipping stuff

Stanislav Kozlovski stanislavkozlovski

:shipit:
just a chill guy shipping stuff
View GitHub Profile

I ran about 20+ distinct tests.

Each test involved me:

  1. setting up two EC2 instances
  2. setting up the networking between them (right AZ, right subnet, right VPC, peering the VPCs, right account)
  3. sending 25GB-100GB of data through them (`dd if=/dev/zero bs=1024 count=$[1024*1024 * 25] | nc 18.199.177.171 7000)
  4. waiting for 1-2 days to see it reflect on the bill for that region

Rinse and repeat for each test. Some tests were tested a few times to confirm. There were times e.g where I'd send 25GB and AWS would bill me for just 15GB.

@stanislavkozlovski
stanislavkozlovski / rc4_test_failures_system_test_run_1.md
Created February 14, 2024 19:49
rc4_test_failures_system_test_run_1
{'kafkatest.benchmarks.core.benchmark_test.Benchmark#test_consumer_throughputArguments:{  "compression_type": "none",  "interbroker_security_protocol": "PLAINTEXT",  "security_protocol": "SSL",  "tls_version": "TLSv1.2"}': 1,
 'kafkatest.benchmarks.core.benchmark_test.Benchmark#test_consumer_throughputArguments:{  "compression_type": "none",  "interbroker_security_protocol": "PLAINTEXT",  "security_protocol": "SSL",  "tls_version": "TLSv1.3"}': 1,
 'kafkatest.benchmarks.core.benchmark_test.Benchmark#test_consumer_throughputArguments:{  "compression_type": "snappy",  "interbroker_security_protocol": "PLAINTEXT",  "security_protocol": "SSL",  "tls_version": "TLSv1.2"}': 1,
 'kafkatest.benchmarks.core.benchmark_test.Benchmark#test_consumer_throughputArguments:{  "compression_type": "snappy",  "interbroker_security_protocol": "PLAINTEXT",  "security_protocol": "SSL",  "tls_version": "TLSv1.3"}': 1,
 'kafkatest.benchmarks.core.benchmark_test.Benchmark#test_end_to_end_latencyArguments:{  "compression_type": "no

Hey all,

I've heard various complaints that build times in trunk are taking too long, some taking as much as 8 hours (the timeout) - and this is slowing us down from being able to meet the code freeze deadline.

I took it upon myself to gather up some data in Gradle Enterprise to see if there are any outlier tests that are causing this slowness. Turns out there are, in this particular build - https://ge.apache.org/s/un2hv7n6j374k/ - which took 10 hours and 29 minutes in total

Here are the top offending tests:

  • ✅ - PASS
  • 🟧 - FLAKY
un2hv7n6j374k-org.apache.kafka.streams.integration.TimeWindowedKStreamIntegrationTest-shouldAggregateWindowedWithNoGrace[ON_WINDOW_UPDATE_false]-1-output.txt
metaPropertiesEnsemble=MetaPropertiesEnsemble(metadataLogDir=Optional.empty, dirs={/tmp/kafka-7862265964045152878: EMPTY})
[2023-12-06 05:14:14,945] INFO [LocalTieredStorage Id=0] Creating directory: [/tmp/kafka-remote-tier-deletesegmentsbyretentionsizetest18303542483502926029/kafka-tiered-storage] (org.apache.kafka.server.log.remote.storage.LocalTieredStorage:289)
[2023-12-06 05:14:14,946] INFO [LocalTieredStorage Id=0] Created local tiered storage manager [0]:[kafka-tiered-storage] (org.apache.kafka.server.log.remote.storage.LocalTieredStorage:301)
[2023-12-06 05:14:14,946] INFO Started configuring topic-based RLMM with configs: {remote.log.metadata.topic.replication.factor=1, remote.log.metadata.topic.num.partitions=5, remote.log.metadata.common.client.bootstrap.servers=localhost:36617, broker.id=0, remote.log.metadata.initialization.retry.interval.ms=300, remote.log.metadata.common.client.security.protocol=PLAINTEXT, cluster.id=Si-GEbyrSJeNRePZuz8ysQ, log.dir=/tmp/kafka-9703822247831753795} (org.apache.ka
[2023-12-06 05:14:34,969] INFO [LocalTieredStorage Id=0] Creating directory: [/tmp/kafka-remote-tier-transactionswithtieredstoretest7406968791210700929/kafka-tiered-storage] (org.apache.kafka.server.log.remote.storage.LocalTieredStorage:289)
[2023-12-06 05:14:34,969] INFO [LocalTieredStorage Id=0] Created local tiered storage manager [0]:[kafka-tiered-storage] (org.apache.kafka.server.log.remote.storage.LocalTieredStorage:301)
[2023-12-06 05:14:34,969] INFO Started configuring topic-based RLMM with configs: {remote.log.metadata.topic.replication.factor=3, remote.log.metadata.topic.num.partitions=3, remote.log.metadata.common.client.bootstrap.servers=localhost:36181, broker.id=0, remote.log.metadata.initialization.retry.interval.ms=300, remote.log.metadata.common.client.security.protocol=PLAINTEXT, cluster.id=9Ott-KxpTMqMRChhvk2mQg, log.dir=/tmp/kafka-3584143055344816110} (org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager:358)
[2023-12-06 05:14:34,970] INFO Successfully con
metaPropertiesEnsemble=MetaPropertiesEnsemble(metadataLogDir=Optional.empty, dirs={/tmp/kafka-10894890209183283184: EMPTY})
[2023-12-06 03:13:26,603] WARN maxCnxns is not configured, using default value 0. (org.apache.zookeeper.server.ServerCnxnFactory:309)
[2023-12-06 04:03:28,062] WARN Client session timed out, have not heard from server in 39992ms for session id 0x0 (org.apache.zookeeper.ClientCnxn:1256)
metaPropertiesEnsemble=MetaPropertiesEnsemble(metadataLogDir=Optional.empty, dirs={/tmp/kafka-11154148799385091495: EMPTY})
[2023-12-06 05:14:49,465] WARN [QuorumController id=1000] Performing controller activation. The metadata log appears to be empty. Appending 1 bootstrap record(s) in metadata transaction at metadata.version 3.7-IV3 from bootstrap source 'test harness'. Setting the ZK migration state to NONE since this is a de-novo KRaft cluster. (org.apache.kafka.controller.QuorumController:108)
[2023-12-06 05:14:53,973] WARN [QuorumController id=1000] Performing controller activation. Loaded ZK migration state of NONE. (org.apache.kafka.controller.QuorumController:204)
[2023-12-06 05:14:59,814] ERROR Unexpected error handling org.apache.kafka.server.AssignmentsManager$DispatchEvent@17fa1fff (org.apache.kafka.server.AssignmentsManager:117)
java.lang.IllegalStateException: Cannot enqueue a request if the request thread is not running
at kafka.server.NodeToControllerRequestThread.enqueue(NodeToControllerCh