In some scenarios of receiving in "peekLock" mode, when the incoming delivery buffer is full(=2048), happens when there are 2048 outstanding deliveries, new messages are not received.
The drain request triggered by the timeout hangs forever, users would have to force-exit their application in this scenario.
Interestingly, the above behaviour is seen in the case of receiving from unpartitioned queues
.
Drain requests seem to work as expected when receiving from partitioned queues
and result in returning zero messages.
Note: This isn't a problem as long as users are settling the messages that are being received.
We have given a repro to the service team, and they couldn't figure out the difference between the partitioned vs unpartitioned that could cause this.
No matter what the service does, it might be a better idea to solve this problem pre-emptively. Meaning.. never let the circular buffer fill up entirely.
If the circular buffer is full, return the messages for batching receiver, notify the user through streaming receiver through the processError.
RHEA says - "If autoaccept is disabled on a receiver, app should ensure that it accepts/releases/rejects the messages received."
Following are the options we can go with.. to address the problem pre-emptively.
When autoaccept = false
Keep a count of all the messages, whenever the settlement is done for a message, decrease the count.
If reaches 2048
Batching -> return with already collected messages
New Batching -> return 0
Streaming -> raise an error on processError
Enhanced Option - 1
○ Allow configuring the circular buffer size at rhea
Rhea should provide us with a new buffer_overflow event
Upon buffer_overflow
Batching -> return with already collected messages
New Batching -> return 0
Streaming -> raise an error on processError
Tweaking Option 1
Rhea should allow accessing the buffer size at the SB SDK level
If the size reaches capacity
Batching -> return with already collected messages
New Batching -> return 0
Streaming -> raise an error on processError
Delivery Manager Map <delivery.id>
○ Gets populated only if autoaccept is false
○ OnMessageSettled trigger would remove the delivery.id from the Map
If reaches 2048
Batching -> return with already collected messages
New Batching -> return 0
Streaming -> raise an error on processError
Enhanced Option 5
○ Allow configuring the circular buffer size at rhea
Option 2 and Option 6 talk about allowing to configure the circular buffer size, no matter which way we pick, this change should happen independently. Option 1 and Option 5 are equivalent(maintaining all the ids vs just the count).
So, we just have to pick between Options 3, 4 and 5
- Option 3 - Rhea provides us with a new buffer_overflow event
- Option 4 - Rhea allows accessing the buffer size at the SB SDK level
- Option 5 - Tracking the count at SB SDK
I personally like Option 4 as it would not require us to maintain a copy/buffer-state when compared to Option 5. And might need less convincing to do at rhea when compared to Option 3.
Only ever set a max of 2047 credits(in peekLock) on the link instead of maxMessageCount or maxConcurrentCalls,
this way the buffer would never be full.
With the solution of checking the buffer size for every message..
(from the investigations of testing Option 4), though I stop receiving after 2047 messages..
Since there were credits on the link, messages were on the line though we stopped receiving.
I could see this by receiving the messages again - the delivery count was 1 for 2050 messages as opposed to 2047 in my hand.
If the credit initialization itself keeps the cap, the problem would be avoided entirely(as expected).
I can't tell if you're saying that the rhea team (well, one person) has told us that option #6 is not an option but it's clearly the "best" one (ie, it should be fine to let the customer determine what their appropriate limit is when it's purely a client-side limitation).