Skip to content

Instantly share code, notes, and snippets.

@kjnilsson
Last active June 24, 2022 20:35
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kjnilsson/8c0f418843d211ea187deabd951864ee to your computer and use it in GitHub Desktop.
Save kjnilsson/8c0f418843d211ea187deabd951864ee to your computer and use it in GitHub Desktop.

Introduction

There are two types of consumer processing failures:

  1. Poison messages. Messages that are badly formatted and either cause the consuming application to crash or otherwise fail to process the message.

  2. Downstream dependency failures. I.e. when a message is valid but a downstream dependency such as a database isn't available. This category can further be subdived into:

    a. All messages on the queue are destined to the same unavailable downstream.

    b. The downstream dependencies may be different depending on the message. (e.g. different databases or integration endpoints).

How to handle each type of failure

Type 1: These messages can never be successfully processed and thus need to be removed from the processing queue promptly. Either the application can publish the message to another exchange (and ack to the source queue) or nack/reject the message without requesting a requeue. If dead-lettering is configured on the source queue the message will then be moved to a dead-letter queue where it can be held for offline/manual analysis.

Type 2a: For this case it makes no sense to reject/requeue the message as all that will happen is that the message will be re-delivered almost immedately (depending on prefetch) and just fail to process again. Any other messages handled by the consumer will also fail and the consuming application just ends up failing to process the same messages over and over again. To avoid just boiling the ocean it is much better for the client application to enter its own periodic retry loop with the first message that failed until the downstream dependency is avilable again. For this case it makes no sense to try to process later messages as if the current message cannot be forwarded to the downstream none of them can.

Caveat: https://www.rabbitmq.com/consumers.html#acknowledgement-timeout

Alternatively the client application can simply unsubscribe/disconnect when it detects an unavailable downstream and re-subscribe when it detects downstream availability again.

Type 2b: Here we have a case where the queue contains a mixture of messages that at any given time may or may not be able to be forwarded to a downstream dependency. Here it makes sense to ask the broker to delay a message whose downstream is unavailable whilst continuing to process messages from the queue. RabbitMQ provides a couple of options for this.

  1. Use the delayed message pattern where the message is rejected (without requeue). The queue then dead-letters the message to another queue that has a Message ttl policy configured that in turn will dead-letter the message back to the original processing queue after the TTL expires.
  2. Ack/reject the message after publishing to the delayed-message exchange routing the message back to the original queue after some time has passed.

Bear in mind that most consuming apps can experience both 1 and 2. Usually a consuming application will take a message, parse/transform it in some way then forward it to the downstream destination. If the message fails to parse/transform it is a poison message and you need to handle it as such (see Type 1). If the forwarding to downstream fails you need to treat it as a 2a or 2b.

@yaronp68
Copy link

There are two(three) types of consumer processing failures:

I would just say two. I think the variations can be discussed later (as you do)

The following are not super clear

Use the delayed message pattern where the message is rejected (without requeue). The queue then dead-letters the message to another queue that has a Message ttl policy configured that in turn will dead-letter the message back to the original processing queue after the TTL expires.
Ack/reject the message after publishing to the delayed-message exchange routing the message back to the original queue after some time has passed.

  • What is the delayed message pattern - either link or add digram
  • Add diagram for number 2
    The best is to show config example

@dumbbell
Copy link

Type 2a: (...) To avoid just boiling the ocean it is much better for the client application to enter its own periodic retry loop with the first message that failed until the downstream dependency is avilable again. For this case it makes no sense to try to process later messages as if the current message cannot be forwarded to the downstream none of them can.

The application could also reject the message and unsubcribe from the queue or disconnect from the broker entirely. This way, the application can do whatever it needs to w.r.t. to the downstream downtime, but RabbitMQ remains responsible for the message safety. The client application can reconnect/subcribe again when it is ready to process messages. It saves resources on both ends.

Otherwise, yes, this is a good document.

@kjnilsson
Copy link
Author

The application could also reject the message and unsubcribe from the queue or disconnect from the broker entirely. This way, the application can do whatever it needs to w.r.t. to the downstream downtime, but RabbitMQ remains responsible for the message safety. The client application can reconnect/subcribe again when it is ready to process messages. It saves resources on both ends.

Absolutely! this is the better approach. it would require a slightly more complicated client application but yes I will mention both approaches. Either way they don't bounce messages back and forth between the broker which is the key property of handling that failure.

@dumbbell
Copy link

In the world of Kubernetes & other "orchestrators", we could even imagine another component monitoring the downstream component and telling the consumer component to stop. Mind blowing, isn't it? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment