Skip to content

Instantly share code, notes, and snippets.

@totten

totten/notes.md Secret

Last active August 3, 2019 07:56
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save totten/fc0f188a85e9ecf76536 to your computer and use it in GitHub Desktop.
Save totten/fc0f188a85e9ecf76536 to your computer and use it in GitHub Desktop.
Examining the design of CiviMail concurrency options

CiviMail includes several options that influence how it manages batching and concurrency. We can characterize the system based on the inputs (config options) and the outcomes (#cron invocations, #msgs sent in each cron run, #concurrent runs).

There are currently 5 inputs:

  1. Mailer Batch Limit (int, #recipients per cron run)
  2. Mailer Job Size (int, #recipients per parallel job)
  3. Mailer Throttle Time (int, time between message deliveries)
  4. Mailer CRON job limit (int, max number of concurrent jobs)
  5. Enable global server wide lock for CiviMail (bool)

The "batch limit" and "job size" are confusingly similar. In both cases, we take a list of recipients, break it down into smaller chunks, and execute each chunk separately. The implementations differ in how they relate to MailingJob records:

  • "Mailer Job Size" - Take the overall list of recipients and explicitly split them into multiple MailingJob records. For each MailingJob, there must be separate cron runs.
  • "Mailer Batch Limit" - Take a single MailingJob record. In the first cron run, deliver to the first page of recipients. In the second cron, deliver to the second page of recipients. Ad nauseum.

The options "Cron job limit" and "global server wide lock" are also confusingly similar. Both constrain the number of concurrent cron runs. One is coarse-grained (global server wide lock) and the other is fine-grined (cron job limit).

I'm trying to persuade myself that these are meaningful differences. However, I feel like we could get the same outcomes with a simpler set of inputs. (This could just mean that I've misinterpreted something - hence the email to verify my interpretations.) The simpler inputs would be:

  1. Mailer Job Size (int, #recipients per parallel job)
  2. Mailer Throttle Time (int, time between message deliveries)
  3. Mailer CRON job limit (int, max number of concurrent jobs)

Now, a few examples:

  • Example 1: Single threaded, purposefully slow
    • Desired outcome:
      • There are 4,000 total messages.
      • Each run sends 50 messages.
      • There are 80 runs.
      • Cron-runs are strictly sequential (one at a time).
      • Each message is spaced at 100ms intervals.
    • Current configs (all would be equivalent if my interpretation is correct)
      • batchLimit=0, jobSize=50, cronLimit=1, globalLock=0, throttleTime=100ms
      • batchLimit=0, jobSize=50, cronLimit=0, globalLock=1, throttleTime=100ms
      • batchLimit=50, jobSize=0, cronLimit=1, globalLock=0, throttleTime=100ms
      • batchLimit=50, jobSize=0, cronLimit=0, globalLock=1, throttleTime=100ms
      • batchLimit=50000, jobSize=50, cronLimit=1, globalLock=0, throttleTime=100ms
      • batchLimit=50, jobSize=50000, cronLimit=1, globalLock=0, throttleTime=100ms
    • Simplified config:
      • jobSize=50, cronLimit=1, throttleTime=100ms
  • Example 2: Multi threaded, purposefully fast
    • Desired outcome:
      • There are 4,000 total messages.
      • Each run sends 50 messages.
      • There are 80 runs.
      • Cron-runs are parallel (up to 5 at a time)
      • Each message is spaced at 100ms intervals.
    • Current configs (all would be equivalent if my interpretation is correct)
      • batchLimit=50, jobSize=0, cronLimit=5, globalLock=0, throttleTime=100ms
      • batchLimit=0, jobSize=50, cronLimit=5, globalLock=0, throttleTime=100ms
      • batchLimit=50, jobSize=250, cronLimit=5, globalLock=0, throttleTime=100ms
      • batchLimit=250, jobSize=50, cronLimit=5, globalLock=0, throttleTime=100ms (note: UI shows error msg)
    • Simplified config:
      • jobSize=50, cronLimit=5, throttleTime=100ms

More generally, if my interpretation is correct, one can convert from the 5-options to the simpler 3-options with an algorithm:

  $new->throttleTime = $old->throttleTime;

  if ($old->globalLock)
    $new->cronLimit = 1;
  else
    $new->cronLimit = $old->cronLimit;

  if ($old->batchLimit == 0 || $old->jobSize == 0)
    $new->jobSize = max($old->batchLimit, $old->jobSize);
    // ex: (batchLimit=0,jobSize=50) ==> jobSize=50
    // ex: (batchLimit=25,jobSize=0) ==> jobSize=25
    // ex: (batchLimit=0,jobSize=0) ==> jobSize=0
  else
    $new->jobSize = min($old->batchLimit, $old->jobSize);
    // ex: (batchLimit=25,jobSize=50) ==> jobSize=25

The old and new configurations would be equivalent when the numbers are perfectly divisible (eg 50 divides cleanly into 4000). If the numbers are not perfectly divisible, then the old configuration may produce extra, small batches for processing the remainders.

However, all this is based on my interpretation of the options -- it is not based on testing or prior knowledge of the code. If my interpretation is wrong, it would be realy helpful to have a couple examples where the proposed "Current configs" (above) would not be equivalent.

@artfulrobot
Copy link

Not sure I should be adding a comment here, but as it's 4 years on and we don't have the above simplifications, I thought I'd add:

The goal of these settings is to help people get their mail delivered as fast as is possible. The settings help them deal with various contraints outside of CiviCRM.

The problem with the original and the simplified aproach is that it doesn't necessarily facilitate this goal because by setting a job size in numbers of emails, with each batch needing to wait for cron to run to start, you introduce lots of dead sleep time. e.g. take a concurrency cron limit of 2:

Cron  1   |-----------|
Cron  2        |---------------|
Cron  3              x not ready, 2 jobs running
Cron  4                    |--------------|
SLEEP                 zzzzzz

There's people whose servers favour concurrency and those who don't.

There's those with SMTP rate limits and those without.

People without a SMTP rate limit may still benefit from concurrency, so wouldn't it be better to use the concurrency setting and divide the emails by the concurrency and the batch size (if set)?

That way, without a batch limit you'd get

Cron 1   |---------------------------------| 
Cron 2        |----------------------------------|
NC       nnnnnn

Where nnn shows the time when you're non-concurrent, which would be your cron delay. Sub optimal but at least the first thread is still pumping email.

And with a batch limit + concurrency

Cron 1  |------------------|
Cron 2       |------------------|
Cron 3             x
Cron 4                   x
Cron 5                          |------------------|
Cron 6                                 |------------------|

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment