We have a process that spawns lots of lambdas that we were worried might hit a concurrency limit. We wanted to know what a step function would do in that situation because the AWS documentation says that poll-based services have a built-in retry mechanism if lambdas are throttled and we were hoping that the subsequent steps would simply wait until there was enough capacity.
So I built a step function with three steps that had a sleep in them, and then set the reserve concurrency to 1 for the middle step. Then I kicked off three step functions at the same time.
The result was the second intance of the step function failed on the throttled lamda with the error message: "Lambda.TooManyRequestsException"
There are retry options in step functions, but it is interesting to know that they don't wait for a lambda slot to be available.
Hi I'm currently looking into using step functions to better orchestrate my serverless application and wanted to understand how it deals with lambda throttling, particularly when invocation fails all retries due to the lambda being throttled, would this be captured by the stepfunction or will i still need a DLQ ?
My current architecture is a system of SQS and lambdas that perform various validations on csv files before attempting to ingest the data into a database, for each lambda depending on its pass/fail it will send the event to a subsequent SQS/DLQ that are then polled by different lambdas.
I'd like to migrate all this logic into a state machine so I can better trace the files but am worried about the possibility of loosing data due to throttling.
Any advise or suggestion on this would be great