Skip to content

Instantly share code, notes, and snippets.

@stevenharman
Last active November 9, 2018 11:12
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save stevenharman/3987569 to your computer and use it in GitHub Desktop.
Save stevenharman/3987569 to your computer and use it in GitHub Desktop.
Sending user signals to Heroku workers/process...

me:

Is it possible to send a signal to a worker/process? I realize the platform sends SIGTERM and SIGKILL when restarting a dyno, but I need to send a USR1 to one of my workers to tell it to stop picking up new jobs. Normally this is achieved via kill -USR1 <pid>, but on the Heroku platform not only do we not know know the pid, we also don't run one-off commands on the same dyno.

Caio (heroku support):

We had this feature experimentally at some point but it was never productized. I recommend you find other ways to signal your processes, like setting a database flag.

me:

Thanks for the quick reply. It is unfortunate that feature has not been further investigated and productized. The suggested workaround is far from ideal, and not very realistic. The whole idea of a USR1 signal is to signal the current process to take some action - in this case, stop taking action. But a new process, which will be spun up after the app is restarted, will have no knowledge of that signal and will get on with doing it's thing - working jobs. Using a database flag introduces a whole host of complexity around managing that state across app restarts and needing to customize (read: hack or monkey patch) existing tools to be aware of that flag.

Caio (heroku support):

That's very reasonable feedback. I'm routing this to our platform engineers.

@wuputah
Copy link

wuputah commented Jan 2, 2013

Hi Steven- in general terms, there's a number of different ways to accomplish this goal:

  1. Halt generation of new jobs into the queue, and allow current queues to empty.
  2. Halt processing of new jobs, allowing current jobs to finish (but queues are unaffected).
  3. Terminate processing of job processing; workers with jobs in progress should gracefully handle this case and terminate and re-queue as appropriate.

I understand it would be convenient in your use case to send a signal to your library of choice to cause #2 to occur. Obviously that's not currently possible on Heroku. However, since workers must handle #3 every 24 hours during dyno cycling, we don't think this is a particularly viable solution.

In cases of large software changes or migrations, where data in the queue is tied to the software implementation handling that data, #1 may actually be the most viable. For some applications, you would enact this by scaling your web workers to zero and put your site into maintenance mode.

More flexibility by queuing libraries could also help. For instance, why is a signal the only way to enact this change in Sidekiq's processing? Perhaps there should be another way to enact these sorts of maintenance modes.

@wuputah
Copy link

wuputah commented Jan 2, 2013

Also, there is some philosophy behind these choices.
http://www.12factor.net/disposability

As noted above, there is an even more aggressive philosophy called crash-only design (particularly in database system or file system design) that notes that abrupt shutdown (e.g. a SIGKILL without chance for cleanup) plus the necessary recovery time at startup is often faster than a graceful shutdown plus graceful startup. This acknowledges that all software, at some point in its life, will be abruptly terminated, whether that is by SIGKILL or by power loss. Ideally, software should be able to recover from this state.

This, too, will inevitably happen to your workers at some point on Heroku (or anywhere for that matter), as the underlying hardware that happens to be running your workers will (eventually) abruptly fail.

@vladmiller
Copy link

Hi wuputah,

Sometimes you might have to issue signals for other reasons, like let's say enable node debug or running app to be able to spot leaks or issues which cannot be found on local machines.

@jchatel
Copy link

jchatel commented Dec 30, 2015

It's not because Heroku likes to send kill signals all the time that we should not be allowed to try to do it gracefully with #2 (and no I can't clear my queue as I got job sets in the future)

http://eng.joingrouper.com/blog/2014/06/27/too-many-signals-resque-on-heroku/

@chrisplusplus
Copy link

I accidentally created an infinite loop with a Messenger bot on Heroku. So my phone was blowing up with about 5 messages per second. I could not kill that process using the Heroku command line. After several thousand messages I ended up pushing a die; (PHP) statement in the offending function. It worked obviously - but I'm not so sure that was the proper way to handle it.

@stevenharman
Copy link
Author

For anyone coming back to this, years later... it seems Heroku has, quietly, increased the SIGTERM timeout from 10 to 30 seconds. I don't recall seeing any announcement to that effect, but it's mentioned in two different spots in the Heroku Dev Center docs:

  1. https://devcenter.heroku.com/articles/dynos#shutdown
  2. https://devcenter.heroku.com/articles/limits#exit-timeout

Still not ideal, but better, I guess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment