SkyperTHC/gist:cb4ebb633890ac36ad86e80c6c7a9bb2

## gistfile2.md

      
    Raw
  

              gistfile2.md
            
          
    Before we get to the problem we need to have a common understand of how PTYs work.
There are two sides: a master and a slave. The shell's STDIN/STDOUT/STDERR are connected to
the slaves side the master (a single FD) is connected to .... normally sshd (when we log into a remote system).
The PTY takes care of special characters. Of course there is a PTY on the client side as well but the ssh client puts this
into raw mode and it wont do anything. We can ignore this. Let's focus on the PTY on the server side.
The user presses Ctrl-c on the client side and the ssh client reads \003 (Ctrl-C character) and forwards this to the sshd server. The
sshd server sends it to the MASTER end of the PTY.
The Linux Kernel detects the \003 (special character) and wont forward it to the shell. Instead it sends a SIGINT to the shell.
Good. Let's use 'script' instead of sshd for this example. It's a bit easier than dealing with sshd.
let's start a shell inside a new PTY using script:
root@osboxes:~# tty
/dev/pts/0
root@osboxes:~# script -q /dev/null
root@osboxes:~# tty
/dev/pts/7   # <--- Here we are inside 'script' on a new PTY
root@osboxes:~# echo $$
904617     # <--- PID of the shell
root@osboxes:~# ps -p $$ -o ppid=
904616     # <--- PID of 'script', the master of the PTY
If we kill script with kill -9 904616 then the shell gets a SIGHUP. The shell dies. That's how PTY's work.
The problem: That's not true for docker. The behaviour is wrong (and it's clear where the 'bug' is but not clear 'why there
is this bug' or how to get the same behavior).
Let's use docker to start a shell:
root@osboxes:~# docker run --rm -it alpine ash
/ # echo $$
1   # <-- Our shell is PID=1 inside docker. 
/ # grep ^SigIgn /proc/$$/status
SigIgn: 0000000000284004   # <--- SIGHUP is correctly set up. 
Check on the host:
root@osboxes:~# ps alxww | grep " ash"
0     0  909220  906992  20   0 1125652 46740 futex_ Sl+ pts/8      0:00 docker run --rm -it alpine ash
4     0  909270  909243  20   0   1704  1060 poll_s Ss+  pts/0      0:00 ash
root@osboxes:~# ps -p 909243 l
F   UID     PID    PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
0     0  909243       1  20   0 712392  9464 futex_ Sl   ?          0:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id a9268a7c683788e9623b8bfe67842ddc06418e81c3cba9c83ad510d3ce4c752e -address /run/containerd/containerd.sock
The docker client sends a command to the kernel to invoke runc to start our ash shell (docker is a glorified wrapper around runc).
Let's kill the docker client (docker run --rm -it alpine ash). The docker client is the 'controlling master' of that PTY for the ash shell (well, not really, but effectivly that's what we expect as a behaviour. With runc the docker-client instructs the kernel via /run/docker.sock to start a process rather than execve() but the idea is the same).
root@osboxes:~# kill -9 909220
Remember how the 'ash' shell received a SIGHUP when we killed the master of the PTY (script)? But here 'ash' keeps running, seemingly
not connected to any PTY (master is gone).
root@osboxes:~# ps -p 909270 l
F   UID     PID    PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
4     0  909270  909243  20   0   1704  1060 poll_s Ss+  pts/0      0:00 ash
Half of the Internet is crying that there are 'stale' shells still running that docker started. Nobody knows
the answer and everyone (wrongly) suggest 'use --init' and 'dont use -t' - but the problem remains and wont solve it :>
Of course if we would kill the runc process then the ash shell would get teh SIGHUP but that's not the point or the problem (and in the real world it's the docker client (not the runc) that gets killed).
If we send a SIGTERM instead of a SIGKILL to the docker client then the docker client (correctly) forwards this to the ash shell (surprisingly, not to the runc process..).  This is called 'signal proxying' and enabled by default for 'docker run' commands.
However, if we start another ash shell using docker exec and we send a SIGHUP to this new 'docker exec' client than the signal is not forwarded to the ash shell and the ash shell remains running when docker-exec has long terminated (on SIGHUP).
So, we end up with stale shells....
So, how do I configure this shit so that ASH gets a SIGHUP when the docker exec-process gets a SIGHUP or SIGKILL? Just like when ASH was started from script or sshd?
Proxying the SIGKILL requires kernel level modifications (to do it right). SIGHUP and all other signals could be proxies by wrapping 'docker exec' inside our own programm and handing the signal proxying....