Skip to content

Instantly share code, notes, and snippets.

@mjambon
Last active March 28, 2024 11:09
Show Gist options
  • Save mjambon/79adfc5cf6b11252e78b75df50793f24 to your computer and use it in GitHub Desktop.
Save mjambon/79adfc5cf6b11252e78b75df50793f24 to your computer and use it in GitHub Desktop.
bash: Run parallel commands and fail if any of them fails
#! /usr/bin/env bash
#
# Run parallel commands and fail if any of them fails.
#
set -eu
pids=()
for x in 1 2 3; do
ls /not-a-file &
pids+=($!)
done
for pid in "${pids[@]}"; do
wait "$pid"
done
#! /usr/bin/env bash
#
# Run parallel commands and fail if any of them fails.
#
# The expected output is something like this:
#
# $ ./parallel-explained
# ls: cannot access '/not-a-file': No such file or directory
# ls: cannot access '/not-a-file'ls: cannot access '/not-a-file': No such file or directory
# : No such file or directory
#
# Our 'parallel-explained' script exited with code '2', because it's the exit
# code of one of the failed 'ls' jobs:
#
# $ echo $?
# 2
#
# 'set -e' tells the shell to exit if any of the foreground command fails,
# i.e. exits with a non-zero status.
set -eu
# Initialize array of PIDs for the background jobs that we're about to launch.
pids=()
for x in 1 2 3; do
# Run a command in the background. We expect this command to fail.
ls /not-a-file &
# Add the PID of this background job to the array.
pids+=($!)
done
# Wait for each specific process to terminate.
# Instead of this loop, a single call to 'wait' would wait for all the jobs
# to terminate, but it would not give us their exit status.
#
for pid in "${pids[@]}"; do
#
# Waiting on a specific PID makes the wait command return with the exit
# status of that process. Because of the 'set -e' setting, any exit status
# other than zero causes the current shell to terminate with that exit
# status as well.
#
wait "$pid"
done
@Brikman
Copy link

Brikman commented Aug 14, 2021

We have two blocking background processes. The first one works well. The second one fails.

But wait will infinitely wait only for the first process (which is still working fine), knowing nothing about the second one which has been failed.

set -eu

pids=()

tail -f /var/log/syslog &>/dev/null &
pids+=($!)

tail -f /nonexistent.log &>/dev/null &
pids+=($!)

for pid in "${pids[@]}"; do
  wait "$pid"
done

@mjambon
Copy link
Author

mjambon commented Aug 15, 2021

[I deleted an earlier reply which was wrong]

But wait will infinitely wait only for the first process (which is still working fine), knowing nothing about the second one which has been failed.

Yes, indeed the solution here has this problem. I tried a few alternatives and they're not obvious. Here's one solution for exiting as early as possible as soon as a child finishes with an error status:

#! /usr/bin/env bash
set -eu

# Declare a numeric variable for counting the children
declare -i n=0

(sleep 3; echo ok3) &
n+=1

(sleep 2; echo fail2; exit 1) &
n+=1

(sleep 1; echo ok1) &
n+=1

while [[ "$n" -gt 0 ]]; do
  echo waiting
  # Wait for any child to finish, returning its exit status,
  # and exiting the script if the status is nonzero (due to 'set -e'),
  # leaving some child processes running.
  wait -n
  n=n-1
done

If we run it, we see that the first job that sleeps 3 seconds keeps running after the parent script terminates. I get this output:

$ ./parallel3
waiting
ok1
waiting
fail2
$ ok3

To fix this, we'd have to kill the remaining children before exiting.

@mjambon
Copy link
Author

mjambon commented Aug 15, 2021

Here's an improved version, which tries to terminate the remaining children before exiting:

#! /usr/bin/env bash
#
# Run parallel commands and fail if any of them fails.
#
set -eu

pids=()

(sleep 3; echo ok3) &
pids+=($!)

(sleep 2; echo fail2; exit 1) &
pids+=($!)

(sleep 1; echo ok1) &
pids+=($!)

for pid in "${pids[@]}"; do
  if wait -n; then
    :
  else
    status=$?
    echo "One of the subprocesses exited with nonzero status $status. Aborting."
    for pid in "${pids[@]}"; do
      # Send a termination signal to all the children, and ignore errors
      # due to children that no longer exist.
      kill "$pid" 2> /dev/null || :
    done
    exit "$status"
  fi
done

It's a little complicated and maybe incorrect in some respects.

@Manouchehri
Copy link

Here's my solution:

#!/usr/bin/env bash

set -eu

ARG1=${1:-$(nproc --ignore=1)}

pids=()

for x in $(seq 1 ${ARG1}); do
  python3 unit_tests.py &
  pids+=($!)
done

for pid in "${pids[@]}"; do
  if wait -n; then
    :
  else
    exit_code=$?
    echo "Process exited with $exit_code, killing other tests now."
    for pid in "${pids[@]}"; do
      kill -9 "$pid" 2> /dev/null || :
    done
    exit "$exit_code"
  fi
done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment