I also built a tuner program and have these numbers. I seeded the stream with 100,000 x 50 or x 8000 byte messages. I ran 10 rounds of every test type (randomizing the order of types for each round) so each type ended up reading 1,000,000 messages. This is against a cluster and the stream is R3
My takeaway is that bigger batches do better on these long continuous pulls. I didn't include batches of 1 or 10 because they were so much slower. For the reader, I realized that for smaller batches I need to repull sooner, but bigger batches 80% seems good enough.
50 byte message size and 8000 byte message size
Type | Batch | 50 bytes | Elapsed ms | msg/ms | ms/msg | 8000 bytes | Elapsed ms | msg/ms | ms/msg |
---|---|---|---|---|---|---|---|---|---|
Fetch | 50 | 9599 ms | 104.18 | 0.0096 | 16417 ms | 60.91 | 0.0164 | ||
Fetch | 100 | 8322 ms | 120.16 | 0.0083 | 15656 ms | 63.87 | 0.0157 | ||
Fetch | 250 |