Unordered means that we don't care about the order of results from the evalMap action.
It allows for higher throughput because it never waits to start a new job. It has N job permits (with a semaphore), and the moment one job finishes, it requests the next element from the stream and begins operation.
When you use parEvalMap
with ordered results, it means that it only begins the next job if the oldest input's job is ready to emit.
This matters when individual elements can take a variable amount of time to complete - and that's the case here, because backfill can take more or less time depending on how many transactions are present within the time window.
Suppose we have 4 jobs we want to run with up to 2 at a time. Job 1 takes 60 seconds to complete, and all the rest take 10 seconds.
Using parEvalMap
would mean the entire set of inputs would take ~70 seconds to complete.
- 00:00 start job 1, 2
- 00:10 job 2 completes. The result of the job cannot emit because job 2 is after job 1
- 00:60 job 1 completes. result of job 1 and job 2 is emitted. job 3+4 start
- 00:70 job 2+3 complete and emit results in order
Whereas using Unordered
, we finish in 60 seconds:
- 00:00 start job 1, 2
- 00:10 job 2 completes, and emits result. job 3 starts
- 00:20 job 3 completes, emits. job 4 starts
- 00:30 job 3 completes and emits
- 00:60 job 1 completes and emits
The reason we can do this here, but cannot do it with kafka or kinesis, is because we don't care about the order of results. SQS message acknowledgement is "acknowledge this individual message", whereas kafka/kinesis acknowledgement is "confirm every message in order up to this message". So with SQS we can confirm out of order, and go faster