me:
Is it possible to send a signal to a worker/process?
I realize the platform sends SIGTERM
and SIGKILL
when restarting a dyno, but I need to send a USR1
to one of my workers to tell it to stop picking up new jobs. Normally this is achieved via kill -USR1 <pid>
, but on the Heroku platform not only do we not know know the pid, we also don't run one-off commands on the same dyno.
Caio (heroku support):
We had this feature experimentally at some point but it was never productized. I recommend you find other ways to signal your processes, like setting a database flag.
me:
Thanks for the quick reply.
It is unfortunate that feature has not been further investigated and productized. The suggested workaround is far from ideal, and not very realistic.
The whole idea of a USR1
signal is to signal the current process to take some action - in this case, stop taking action. But a new process, which will be spun up after the app is restarted, will have no knowledge of that signal and will get on with doing it's thing - working jobs. Using a database flag introduces a whole host of complexity around managing that state across app restarts and needing to customize (read: hack or monkey patch) existing tools to be aware of that flag.
Caio (heroku support):
That's very reasonable feedback. I'm routing this to our platform engineers.
Also, there is some philosophy behind these choices.
http://www.12factor.net/disposability
As noted above, there is an even more aggressive philosophy called crash-only design (particularly in database system or file system design) that notes that abrupt shutdown (e.g. a SIGKILL without chance for cleanup) plus the necessary recovery time at startup is often faster than a graceful shutdown plus graceful startup. This acknowledges that all software, at some point in its life, will be abruptly terminated, whether that is by SIGKILL or by power loss. Ideally, software should be able to recover from this state.
This, too, will inevitably happen to your workers at some point on Heroku (or anywhere for that matter), as the underlying hardware that happens to be running your workers will (eventually) abruptly fail.