We have a process that spawns lots of lambdas that we were worried might hit a concurrency limit. We wanted to know what a step function would do in that situation because the AWS documentation says that poll-based services have a built-in retry mechanism if lambdas are throttled and we were hoping that the subsequent steps would simply wait until there was enough capacity.
So I built a step function with three steps that had a sleep in them, and then set the reserve concurrency to 1 for the middle step. Then I kicked off three step functions at the same time.
The result was the second intance of the step function failed on the throttled lamda with the error message: "Lambda.TooManyRequestsException"
There are retry options in step functions, but it is interesting to know that they don't wait for a lambda slot to be available.
Gotcha. I’m using step functions and starting to look into scaling so was just curious if you found any best practices for lots of concurrent step function executions.