I ran into this issue at work in a test runner we have that is written in Go and runs a test suite which is a node module, which itself spawns several subprocesses. The issue was that one of the subprocesses inherited its parents' stdout file descriptor, but would outlive the parent process being killed. If you use Go's
exec.Cmd to run a process and set its
Stderr to something that is not a
File, a goroutine will be spawned to drive that I/O when calling
Cmd.Start, and all such goroutines will be run to completion as part of
Cmd.Wait. In this situation, because the grandchild process stays alive, it keeps the stdout pipe open, thus
Cmd.Wait blocks until it exits.
subprocess.go is a minimal testcase to display the issue. It spawns itself as a child process with piped stdout, then waits one second before killing the subprocess and waiting on it. The child process spawns itself as a grandchild with inherited stdout and sleeps for 5 seconds before exiting. The grandchild process simply sleeps for 5 seconds before exiting. When run, you can see that the first process winds up waiting until the grandchild process exits (the child and grandchild processes prefix their output with their pid in brackets):
$ go run subprocess.go [281ns] Spawning child process [443.594µs] Sleeping for one second [1.455µs] Child process started [76.366µs] Spawning grandchild [489.851µs] Sleeping for 5 seconds [1.631µs] Grandchild process started [48.321µs] Sleeping for 5 seconds [1.000591184s] Killing child process 4086632 [1.000716182s] Waiting for child process [5.000200347s] Done [5.005455282s] Child process was killed [5.005474676s] Done!
subprocess_setpgid.go is the same program but with my suggested fix: putting the child process in a new process group and then killing the process group so that the grandchild process is also killed. Running this version shows that the first process receives the child processes' exit status almost immediately upon calling
Cmd.Wait, and the grandchild process does not run to completion:
$ go run subprocess_setpgid.go [215ns] Spawning child process [432.164µs] Sleeping for one second [1.133µs] Child process started [31.065µs] Spawning grandchild [312.907µs] Sleeping for 5 seconds [659ns] Grandchild process started [15.239µs] Sleeping for 5 seconds [1.00059334s] Killing child process group 4086776 [1.000691397s] Waiting for child process [1.001147315s] Child process was killed [1.001169014s] Done!