We have a process that spawns lots of lambdas that we were worried might hit a concurrency limit. We wanted to know what a step function would do in that situation because the AWS documentation says that poll-based services have a built-in retry mechanism if lambdas are throttled and we were hoping that the subsequent steps would simply wait until there was enough capacity.
So I built a step function with three steps that had a sleep in them, and then set the reserve concurrency to 1 for the middle step. Then I kicked off three step functions at the same time.
The result was the second intance of the step function failed on the throttled lamda with the error message: "Lambda.TooManyRequestsException"
There are retry options in step functions, but it is interesting to know that they don't wait for a lambda slot to be available.
I can say that this applies to SQS and SNS as well, which is why AWS recommend you always have multiple re-tries configured.
The documentation really isn't very clear on the topic of what's event based and what's polled based and how the re-tries work.
Are you interested in anything in specific? We didn't end up using a step function for our process, but I have spent lots of time working with lambdas, API GWs, queues and topics over the last 12 months.