Flink has a deadlock case and can be caused by the combination of Flink's backpressure mechanism with iterations (more likely when there is heavy feedback load).
There is a proposal to resolve this issue, but it seems to be abondoned.
The walkaround, suggested by Gábor Hermann, is to throttle messages into the iteration.
Here is my implementation.
elementsPerSecond
is a hyper parameter that needs to be tuned manually
and it seems to be related with operator network structure.
Put your network under high pressure, especially heavy feedback load.
If deadlocks happens, half elementsPerSecond
and try again.
Hope it helps.