Message ordering on Windows Azure Service Bus Queues
One of the services offered by Windows Azure Service Bus is queues which enables persistent messaging between applications or services. The queues on Service Bus operates with a brokered messaging scheme where the service acts as an intermediary between producers and consumers. Messages are first stored by the service before being passed onto consumers thus enabling temporal decoupling between communicating entities as well as allowing for space decoupling where such characteristics are desired.
On how to use Service Bus Queues the website states the following:
Queues offer First In, First Out (FIFO) message delivery to one or more competing consumers. That is, messages are typically received and processed by the receivers in the order in which they were added to the queue, and each message is received and processed by only one message consumer.
Another article adds the following under the "Additional Information" in the "Foundational Capabilities" section:
The guaranteed FIFO pattern in Service Bus Queues requires the use of messaging sessions. In the event that the application crashes while processing a message received in the Peek & Lock mode, the next time a queue receiver accepts a messaging session, it will start with the failed message after its time-to-live (TTL) period expires.
In this blog post we will explore a few scenarios where queued messages may be consumed out of order.
Failures and locks
By default, messages are consumed in PeekLock mode, i.e. the message is locked and made unavailable for other consumers until the message is marked as completed or upon lock expiration, whichever occurs first. Until a message is marked as complete, it will remain in storage. Should the message lock expire on the message before it being marked as complete, it will again be made available for consumption.
In addition to the crashing consumer scenario describe in the previous section, the message lock may also expire should message processing perceived to be too lengthy, e.g. due to computational complexity or network partitioning. The resulting behaviour is that while any one message that passes through the queue is guaranteed to be processed at least once, it may also, perhaps inadvertently, be processed more than once.
Is it possible to mitigate the issue, though never fully prevent it from occuring, by extending the message lock through the RenewLock method. Upon receiving a message it is possible determine the message lock expiration date and time through examining the LockedUntilUtc-property. Should the typical message processing operation require more time to complete than the duration of the message lock it would then be preferrable to divide the operation into smaller pieces and extend the message lock where necessary.
If instead at-most-once messaging semantics are desired, ReceiveAndDelete receive mode may be used instead. While this does guarantee FIFO message order for non-partitioned queues, there is a risk of losing messages as they are be removed from storage immediately after being received by a consumer.
Abandoning and deferring processing
Abandoning a message is only available when it's received in PeekLock mode. Upon abandonment the message will be made available for any consumer of the queue. Conceptually, this can be considered to be equivalent to explictly removing the message lock on PeekLock-ed messages.
A deferred message, however, will not be implicitly made available again for consumers and can only be retrieved through explictly requests using its SequenceNumber. Consequently, a strategy for managing sequence numbers of deferred messages needs to be in place before deferred processing is used.
Lastly, with the announcement of partitioned queues another source of (unintended) message re-ordering was introduced.
As stated in the linked blog post, queues that have enabled partitioning will have its messages distributed across n unique message brokers where n, from the developer's perspective, appear indeterminate. When a message is requested from a partitioned queue, Service Bus will at random pick one of the non-empty partitions from which the next message is retrieved and sent to the consumer and consequently allow for messages to be consumed out of order.
Additionally, there is a subtle difference how message re-ordering manifests itself on the consumer side. Provided there is synchronization between consumers there is a theoretical possibility of detecting messages being processed out of order by examining the SequenceNumber-property as they are guaranteed by the message broker to be in ascending order. However, as Service Bus ensures that messages in partitioned queues are distributed between n unique message brokers, the ascending SequenceNumber characteristics is no longer guaranteed and messages may appear out of order irrespective of the behaviour of the consumers.
While Service Bus Queues at glance may offer FIFO characterstics, its default behaviour, through attributes described in this blog post, does not guarantee FIFO message order. If message order is paramount, it is possible to retain FIFO order for non-partitioned queues through the use of ReceiveAndDelete receive mode; however, its trade-offs should be carefully considered beforehand.
Some readers would also be correct to notice sessions have been left out of this post entirely -- sessions will instead be the topic of my next post.