The explantion below is taken out of the context of an internal email exchange, and while we already have product documentation about this topic, this also seemed worth sharing. The email context stated that the customer asked for string First-In-First-Out (FIFO) assurances, but without any specifics about their scenario. Scenarios for hard FIFO requirements are not very common, I might add:
"It’s usually helpful to ask the customer back on exactly why they want they want true in-order delivery, i.e. FIFO, and then look at the use-case.
Service Bus has a specific feature for helping with order assurances, namely Sessions.
Order preservation requires a grouping criterion for the sequence you need ordered, and it needs a mechanism that ensures that messages are delivered to a receiver in that order. That includes not only that the sequence sorting is observed, but that also that no messages go missing because they’ve been “stolen” by a competing consumer.
For messages that originate from devices, useful grouping criterions are the device-id, or the identifier of a component, or the identifier of an upstream system or other context behind that device. The exact relative arrival order of messages that come from separate devices is hardly meaningful. If you truly care about time-correlated events across contexts (sensor fusion etc), you’ll do that in a database or analytics context somewhere downstream. The chosen grouping criterion becomes the SessionId in Service Bus. It becomes the PartitionKey in Event Hubs. In IoT Hub, the choice is made for you.
Once a client has sent messages related to a context in a particular order, handling those messages in that exact order requires exclusive control over that sequence by the processing receiver. In Event Hubs, that’s trivial because the client controls the cursor (reading offset). Messages sit in an Event Hub partition in the exact order in which they were written into the log and can be picked up in that order by any client at their own pace. Customers’ FIFO requirements come, however, often coupled with the desire to process messages just once if possible, meaning you then need a real queue to coordinate that.
Sessions solve that problem in Service Bus queues and topic subscriptions by giving a receiver exclusive control over all present and future messages with the same SessionId. The messages get demultiplexed by SessionId, and even if there are 50 concurrent receivers, and the receiver having a lock on session “1234” is busy, if the next message at the top of the queue is for session “1234”, no one else gets it. That means it’s a deliberate bottleneck to make sure that the overall queue sequence and the session contexts in that queue are processed exactly in order.
If the message handler just stashes the input message into a database, it’s usually better to let the database index do the ordering.
If the sequence contains state machine commands where order matters, there’s a further feature, namely Deferral, that can help with explicit management of the processing order.
Let’s say the origin context issues four commands in sequence: 1:ON, 2:OFF, 3:ON, 4:OFF. You would surely expect the end state to be “OFF”. Instead of only relying on the messaging infrastructure for passing the messages in that order, the origin context also explicit says “this is step 2”. With Deferral, the messages can show up in any order, but if step 2 shows up before step 1, you can explicitly park step 2 in the queue as deferred until you’re done with step 1 and then resume it. The list of deferred messages (their sequence ids) is part of your state machine state, meaning that if the state machine waits for the next operation to run, it can just look up locally whether there’s already a message in waiting for the next expected or one of the next expected steps.
Deferral is useful because it doesn’t change the delivery state of the message and therefore it also keeps enforcing TTL, for instance. The message will also be safer in the queue than in an unreplicated in-memory context. For TTL, if you can’t get step 1 done in the time for the other steps to have expired, you should not be able to execute them.
If Service Bus were to introduce a concept for flowing back the delivery state of a message to the sender, a report that processing has been completed will be anchored on the receiver calling “complete”. With deferral that’s exactly when you picked up the message again and did the work. When you just stash a message copy into the state machine state, that happens when you stashed it away."
Clemens Vasters (clemensv@microsoft.com), Nov 15 2017