rajibkuet07/kafka-retry-mechanism.md

## kafka-retry-mechanism.md

      
    Raw
  

              kafka-retry-mechanism.md
            
          
    Retry Mechanism in Kafka


Create a database table for the retry events - id, topic, partition, message, hash(md5(the unique identifier for the failed message)), offset, counter(count of retry for this message), nextEventTime(DateTime | gradually increase based on the counter), published(false | true whether or not a retry message is published in the topic again), handled(false | true whether the message handled again), status(success | error status of the event when handled again)
When an error happens in any consumer emit an event to the retry topic. The retry topic can get a message consisting of hash(md5(the unique identifier for the failed message), topic(topic name of the failed message), partition(partition number of the failed message), message(actual message of the failed message), error(error data or string for which the previous message failed), fallback(optional | {topic: "fallback topic name", value: "fallback message value"} | if max count exceeds then produce this fallback topic with the value as message), configs(optional | if some configs need for this retry event message like initial retry time, max retry time, max count))
In the retry topic consumer, it will store a retry event data for the failed event in the database
Add an api endpoint so that from outside, the failed topic with data from the database could be run again. For example from the server, we can add a cron to hit this endpoint and produce the failed messages again
In the api endpoint - if any events in the database have nextEventTime less than or equal to the current time then produce these events in the topic with the same messages(stored in the database) again. After producing the retry events update the publish status to omit the racing condition of producing the same events multiple times
Steps In Consumer That Cloud Fail

Consumer on Success

Check db for handled = false and hash = md5(unique identifier for the failed message)
If not found then do nothing
If found then update handled = true and status = success


Consumer on Fail

Emit event to the retry topic with specific data in the message


Steps In The Retry Topic Consumer

Check db for handled = false and hash = md5(unique identifier for the failed message)
If found then update the handled = true and status = error of the found row
Add a new row with the next retry event if the counter does not exceed the max retry count
In the new row add the counter = (previous counter || 0) + 1 and nextEventTime = currentTime + (2^counter * 60000(initial interval time in milliseconds from configs or env))
If the counter of the found row equals the max retry count from the configs or env then it will check if the retry topic message has any fallback details fallback = {topic: string, value: any}
If any fallback detail is found then it will emit an event to the fallback topic with the value in the fallback


help - https://github.com/uptool/gmb-site-backend/blob/master/src/global/kafka/retry-event-consumer.service.ts