Skip to content

Instantly share code, notes, and snippets.

@Daenyth
Created November 28, 2023 16:44
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Daenyth/4eeaac6517c70fffee880449337391f2 to your computer and use it in GitHub Desktop.
Save Daenyth/4eeaac6517c70fffee880449337391f2 to your computer and use it in GitHub Desktop.
fs2 parEvalMap vs parEvalMapUnordered

Unordered means that we don't care about the order of results from the evalMap action.

It allows for higher throughput because it never waits to start a new job. It has N job permits (with a semaphore), and the moment one job finishes, it requests the next element from the stream and begins operation.

When you use parEvalMap with ordered results, it means that it only begins the next job if the oldest input's job is ready to emit.

This matters when individual elements can take a variable amount of time to complete - and that's the case here, because backfill can take more or less time depending on how many transactions are present within the time window.

Suppose we have 4 jobs we want to run with up to 2 at a time. Job 1 takes 60 seconds to complete, and all the rest take 10 seconds. Using parEvalMap would mean the entire set of inputs would take ~70 seconds to complete.

  • 00:00 start job 1, 2
  • 00:10 job 2 completes. The result of the job cannot emit because job 2 is after job 1
  • 00:60 job 1 completes. result of job 1 and job 2 is emitted. job 3+4 start
  • 00:70 job 2+3 complete and emit results in order

Whereas using Unordered, we finish in 60 seconds:

  • 00:00 start job 1, 2
  • 00:10 job 2 completes, and emits result. job 3 starts
  • 00:20 job 3 completes, emits. job 4 starts
  • 00:30 job 3 completes and emits
  • 00:60 job 1 completes and emits

The reason we can do this here, but cannot do it with kafka or kinesis, is because we don't care about the order of results. SQS message acknowledgement is "acknowledge this individual message", whereas kafka/kinesis acknowledgement is "confirm every message in order up to this message". So with SQS we can confirm out of order, and go faster

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment