Shells create process groups when running commands. This is true regardless of synchronous or asynchronous commmands.
Unix processes are not automatically supervisory processes (unlike Erlang). This extends Unix shells as well.
However things like pipe groups and terminal TTY shortcuts (CTRL+C) obscure this fact.
Let's clear this up.
Imagine we have a Python program ./script.py
which will run forever, but also
listen to signals, print them out and exit.
#!/usr/bin/env python3
import time
import signal
def sighandler(signum, frame):
print('SIGNAL HANDLER called with SIGNAL:', signum)
exit()
signal.signal(signal.SIGINT, sighandler)
signal.signal(signal.SIGTERM, sighandler)
signal.signal(signal.SIGQUIT, sighandler)
print('READY')
while True:
print('LIVE')
time.sleep(1)
Then we have a shell script ./orchestrate.sh
that "orchestrates" the Python
program:
#!/usr/bin/env sh
echo $$
# it does not matter if this was asynchronous using `&` and wait
python3 ./script.py
If you run this script, you'll get something like:
> ./orchestrate.sh
16933
READY
LIVE
LIVE
^CSIGNAL HANDLER called with SIGNAL: 2
What has happened is that you have used a terminal shortcut CTRL+C
which is
first handled by your terminal emulator. The terminal emulator will send SIGINT
to the foreground process group 16933
. This will interrupt both bash
and the
python
program. Which results in the entire process hierarchy stopping.
However if you instead use kill -SIGINT 16933
from a different terminal, this
will send SIGINT
to just the bash
process, but bash
will not propagate the
SIGINT
to its python
child process. The ./orchestrator
exits, but
./script.py
continues running.
To ensure that you are killing the entire process hierarchy you need to instead
use kill -SIGINT -16933
. The -
prefix sends the signal to the process group
instead.
Obviously our ./orchestrator.sh
is not very robust. To make it robust, we must
explicitly turn it into a supervisor.
To do this, we can use traps:
#!/usr/bin/env sh
echo $$
trap 'exit' INT QUIT TERM
trap 'kill -TERM 0' EXIT
python3 ./script.py
The above will trap SIGINT
, SIGQUIT
, SIGTERM
and raise an EXIT
condition.
The EXIT
condition will then be handled by kill 0
, which sends SIGTERM
to
the current process group. Note that SIGTERM
is the default kill
signal,
however it is good to be explicit. Make sure to use -TERM
for posix
compatibility.
The above is just an example, your orchestrator traps should be customised to your situation.
Remember it is possible to receive signals multiple times. So the python
signal handler can be called multiple times. You must make sure that the handler
is idempotent.
Finally:
# sends SIGINT to just the PID
kill -INT <pid>
# sends SIGINT to the process group using the pid
kill -INT -<pid>
The only good way of creating shell script orchestrators is like this:
cleanup () {
kill -TERM $(jobs -p) 2>/dev/null || true
}
trap 'exit' INT QUIT TERM
trap cleanup EXIT
comm &
{
comm &
comm &
wait
} &
wait
If you use set -m
inside a script, each individual command group will get its own process group ID.
If you don't then they inherit the shell's process group ID.
Session ID is meant to be associated to a "tty", or to "sessions" that is separate from a tty. So daemons get their own SID.
But also each terminal (pty or vty or tty) gets their own SID. An SID is for process grouping at the "session" level related to a terminal or non-terminal.
Process groups are intended for allowing one to signal a group together. Thus process trees don't really exist in Unix, there are process groups instead. Process trees still exist, but only through PID to PPID relationships.
Oh it turns out that
kill 0
is bad idea when you're a script, cause you inherit the same process group. (This means you could in fact kill the parent process as well, so you're not really simulating a process tree semantics).Instead we have to use this:
Unfortunately relying on
jobs -p
is a Bash specific behaviour. There's nothing in ZSH that I can find that allows you to get a simple list of the PIDs of the current job table.