CMCDragonkai/linux_process_trees_and_process_hierarchy.md

## linux_process_trees_and_process_hierarchy.md

      
    Raw
  

              linux_process_trees_and_process_hierarchy.md
            
          
    Linux/Unix Process Trees and Process Hierarchy

Sleep is an executable that we can use to simulate a blocking process for the purposes of demonstrating how to handle child processes.
Here's our base case.
The execution pattern is going to occur in this way: zsh (interactive) -> bash (./parent.sh) -> bash (./child.sh) -> sleep. Because both Bash processes performed an exec without fork, the bash (./parent.sh) process will be replaced by Bash (./child.sh), which itself will be replaced by sleep. This will mean that there are no child processes to manage except for the immediate child of sleep. Upon sending SIGINT, the sleep process will be terminated, and there will be no resource leak. No orphaned nor zombie processes.
./parent.sh:
#!/usr/bin/env bash

exec ./child.sh
./child.sh:
#!/usr/bin/env bash

exec sleep 1000
> alias pss='ps -A --format comm,pid,ppid,pgid,sid' # just aliasing the command to use for later
> ./parent.sh &
[1] 4254
> pss | grep 4254 # acquire the sleep process
sleep            4254  4014  4254  4014
> pss | grep 4014 # 4014 is the parent id and session id, which would the ZSH shell that launched the program
zsh              4014  4008  4014  4014
sleep            4254  4014  4254  4014
ps               4378  4014  4378  4014
grep             4379  4014  4378  4014

2 important things to consider here. Firstly, the final process hierarchy is:
 zsh (4014) ----------+
  |                  / \
  |                 /   \
  |                /     \
  |               /       \
sleep (4254)    ps (4378)  grep (4379)

Secondly, the PGID of sleep is 4254. This is the same ID as the PID of sleep. We also notice that the PGID of ps and grep are also equal to the PID of ps. It turns out that the process group with respect to shell job control, is just any group of processes launched together while joined with pipes or no pipes at all. In the case of sleep, it stands alone. But ps | grep are bundled together. So process groups don't tell us what shell they are part of, instead we need the session ID, which is in fact pointing to the zsh's session ID. Note that SID doesn't necessarily have to equal PID. But it does seem to occur often.
Note that ZSH's job control table will not show sleep, instead it will only show ./parent.sh. On the other hand ps table will only show sleep.
Now let's try a situation there is fork + exec. This time the parent will fork and exec to launch the ./child.sh as a child process, and ./child.sh will be replaced with sleep. We should expect a process hierarchy of zsh parenting bash which parents sleep.
./parent2.sh:
#!/usr/bin/env bash

./child.sh
> ./parent2.sh &
[1] 4421
> pss | grep 4421
bash             4421  4014  4421  4014
sleep            4422  4421  4421  4014
> kill -TERM %1
[1]  + terminated  ./parent2.sh`
> pss | grep 4421
> pss | grep 4422

Here we see that that indeed, we do get 2 processes bash and sleep, both of which are part of the same PGID and SID, but different PPID. The process hierarchy is like this.
 zsh (4014)
  |
 bash (4421)
  |
sleep (4422)

Running kill through ZSH's job control feature ends up killing the entire command, which kills the entire tree. There are no orphans, nor are there any zombies. Is it that ZSH is somehow propagating the SIGTERM signal to all processes in the group? Or is the ./parent bash process that is propagating SIGTERM to its child?
However, what if instead of going through ZSH's job control feature, we directly killed the ./parent bash process? What would happen then? Would we get an orphaned process?
> ./parent2.sh &
[1] 4516
> pss | grep 4516
bash             4516  4014  4516  4014
sleep            4517  4516  4516  4014
> kill -TERM 4516
[1]  + terminated  ./parent2.sh
> pss | grep 4516
sleep            4517     1  4516  4014

Now instead of killing entire process tree, we're left with an orphaned process. The ./parent bash process is gone. However the sleep process has now become an orphaned process, and its PPID is now set to 1 which is PID 1 init. The PID 1 has inherited the orphan, and will wait for it to finish execution. What hasn't changed is its PGID and SID. Its PGID is still 4516, and at the moment, it is the only process in 4516 PGID. At the same because the SID hasn't changed, even if a process is an orphan, we can still know from which shell or "session" did the process launch from. I usually consider this a resource leak. But this is still a technique people use in order to launch daemons from foreground. Now when you exit from the shell, the shell will not send SIGHUP to the daemon, thus leaving it to run potentially forever.
Without a proper service wrapper like systemd or others, this fork, exec and disown is the main way to run daemons. Of course there are easier ways such as using the disown and nohup and &| commands, than running a parent + child process and directly killing the parent.
Note that by default ZSH will send SIGHUP to all child processes, foregrounded or backgrounded. Only processes launched with nohup or disowned will not be closed when ZSH closes. This however doesn't seem to happen in Cygwin ZSH. So that seems like a bug. Also for Bash, you need to enable the huponexit option.
So it turns out that it cannot be the ./parent bash process that is propagating the SIGTERM before. It must be ZSH. It might be in fact using the entire PGID as it appears that PGID not only encapsulates commands that are joined via pipes, but parent child process trees too.
What if we kill the child directly instead of killing the parent?
> ./parent2.sh &
[1] 4589
> pss | grep 4589
bash             4589  4014  4589  4014
sleep            4590  4589  4589  4014
> kill -TERM 4590
./parent2.sh: line 3:  4590 Terminated              ./child.sh~
[1]  + exit 143   ./parent2.sh
> pss | grep 4589

Because the parent was running the child process synchronously, as the child was terminated, the exit status bubbled up to the parent, the parent process continues to work until it finishes and exits. In this case, the parent had nothing else to do, so it returned the child's exit status. This why the job status at the end is exit 143 rather than done. However if we had changed our ./parent2.sh to also run another task after ./child.sh, then the job control would not report with exit 143 but instead with done. This behaviour makes sense, if our parent is simply launching a child process without replacing, than we should expect that the child's exit status is bubbled up to the parent's exit status.
Let's investigate some more asynchronous behaviour.
./parent3.sh:
#!/usr/bin/env bash

./child.sh &

> ./parent3.sh &
[1] 4647
[1]  + done       ./parent3.sh`
> jobs
> pss | grep 4647
sleep            4648     1  4647  4014

In this situation, this is equivalent to running disown or in our above case, directly killing the parent. However, there the parent exited successfully, so since the parent finished, the child automatically becomes an orphan of PID 1. This is really important to understand, processes don't get inherited by their parent's parent, but simply by PID 1. It's kind of unintuitive.
To prevent the above from happening, we can make the parent process wait on backgrounded child processes.
./parent4.sh:
#!/usr/bin/env bash

./child.sh &

wait

> ./parent4.sh &
[1] 4680
> pss | grep 4680
bash             4680  4014  4680  4014
sleep            4681  4680  4680  4014
> kill -TERM 4681
./parent4.sh: line 5:  4681 Terminated              ./child.sh~
[1]  + done       ./parent4.sh

Notice how the difference is that wait finishes successfully with exit 0 even if the child was terminated. After all, wait did succeed in its directive.
Now what about adding in a grandparent?
./grandparent.sh:
#!/usr/bin/env bash

./parent2.sh

> ./grandparent.sh &
[1] 4693
> jobs
[1]  + running    ./grandparent.sh
> pss | grep 4693
bash             4693  4014  4693  4014
bash             4694  4693  4693  4014
sleep            4695  4694  4693  4014
> kill -TERM 4693
[1]  + terminated  ./grandparent.sh
> pss | grep 4693
bash             4694     1  4693  4014
sleep            4695  4694  4693  4014
> kill -TERM 4694
> pss | grep 4693
sleep            4695     1  4693  4014
> kill -TERM 4695
> pss | grep 4693

As we can see, extending it to grandparents results in the same behaviour as a single parent and child. The entire process tree always has the same PGID starting from the initial top parent process. Now what do you think would happen if we killed the parent, but left the grandparent and child? I suspect this would cause the grandparent the terminate, but leave the child as an orphan.
> ./grandparent.sh &
[1] 4709
> pss | grep 4709
bash             4709  4014  4709  4014
bash             4710  4709  4709  4014
sleep            4711  4710  4709  4014
> kill -TERM 4710
./grandparent.sh: line 3:  4710 Terminated              ./parent2.sh
[1]  + exit 143   ./grandparent.sh
> pss | grep 4709
sleep            4711     1  4709  4014

We can see that terminating the middle process, makes the grandparent receive the exit status, while the child is just left to be an orphan. This is true even if the grandparent survives to do other things. The grandparent does not necessarily inherit the grandchild.
This gives us the surprising fact, that killing any process in an arbitrary process tree in Unix, does not directly imply that any other process will be killed. With regards to the process, its parents' simply receive the status code either from synchronous calls, or from a later wait call. Its children are just inherited by the PID 1. Unix processes are simply by default not strict process trees. And they don't maintain tight relationships with each other.
So how do we get nicely behaved process trees, where killing children, allows parents to continue, but killing parents automatically kills all transitive children? This particular type of process is called supervisor processes. In order to create supervisor processes, we need to add some extra code to handle all sorts of signals. We need to handle termination signals, and propagate that to all children. We need to handle child termination signals, and reap their resources to avoid zombie processes. We need to consider whether PGID or SID come into play here? And most importantly we also need to consider what happens if our supervisor launches process trees that don't behave like itself, namely a supervisor? How does a supervisor deal with non-supervisory parents? This is all quite complicated, and now we can see why Unix/POSIX didn't make all processes by default supervisors. It's not trivial.
Note that SIGKILL and SIGSTOP cannot be caught or handled. This means using SIGKILL or SIGSTOP will always leave orphans if you are running process trees, supervisor or not. There's an exercise to figure out how to kill an entire process tree without relying on shell job control, and this can come up if for some reason, the parent supervisor was killed and left a process tree orphan.
For now, I shall leave it here. And later we can explore some Unix C based supervisors and compare them to Erlang supervisors.

http://stackoverflow.com/questions/392022/best-way-to-kill-all-child-processes
http://riccomini.name/posts/linux/2012-09-25-kill-subprocesses-linux-bash/
https://en.wikipedia.org/wiki/Orphan_process
http://www.linusakesson.net/programming/tty/


Updates regarding the above.
When processes are joined with pipes and become a shell process group

Running process1 | process2, results in this dataflow architecture (assuming /dev/pts/0 is your attached terminal device for the shell):
/dev/pts/0 -> STDIN - process1 - STDOUT -> pipe0 -> STDIN - process2 - STDOUT -> /dev/pts/0
                         |                                     |
                       STDERR                                STDERR
                         |                                     |
                         v                                     v
                     /dev/pts/0                            /dev/pts/0

A better ps command

Use this instead:
alias pss='ps -A --format comm,pid,ppid,pgid,sid,stat,wchan,tty'

The stat column shows process status, and can illuminate more about what's going on:
PROCESS STATE CODES
       Here are the different values that the s, stat and state output specifiers (header "STAT" or "S") will display to describe the state of a process:
       D    uninterruptible sleep (usually IO)
       R    running or runnable (on run queue)
       S    interruptible sleep (waiting for an event to complete)
       T    stopped, either by a job control signal or because it is being traced.
       W    paging (not valid since the 2.6.xx kernel)
       X    dead (should never be seen)
       Z    defunct ("zombie") process, terminated but not reaped by its parent.

       For BSD formats and when the stat keyword is used, additional characters may be displayed:
       <    high-priority (not nice to other users)
       N    low-priority (nice to other users)
       L    has pages locked into memory (for real-time and custom IO)
       s    is a session leader
       l    is multi-threaded (using CLONE_THREAD, like NPTL pthreads do)
       +    is in the foreground process group.

Pay attention to the + and s codes.
There are Supervisor Processes that can catch grandorphans

It isn't always true that an orphaned process will be inherited by PID 1. From
Linux 3.4 onwards, processes can call prctl with the PR_SET_CHILD_SUBREAPER
option which means they will acquire parenthood of any grandorphans. This is
now implemented in both Upstart user instances and systemd user instances. This
gets us one step closer to Erlang style supervisor trees.
See:

http://unix.stackexchange.com/a/177361/56970
http://unix.stackexchange.com/a/194208/56970