Skip to content

Instantly share code, notes, and snippets.

@mjambon
Last active March 28, 2024 11:09
Show Gist options
  • Star 26 You must be signed in to star a gist
  • Fork 5 You must be signed in to fork a gist
  • Save mjambon/79adfc5cf6b11252e78b75df50793f24 to your computer and use it in GitHub Desktop.
Save mjambon/79adfc5cf6b11252e78b75df50793f24 to your computer and use it in GitHub Desktop.
bash: Run parallel commands and fail if any of them fails
#! /usr/bin/env bash
#
# Run parallel commands and fail if any of them fails.
#
set -eu
pids=()
for x in 1 2 3; do
ls /not-a-file &
pids+=($!)
done
for pid in "${pids[@]}"; do
wait "$pid"
done
#! /usr/bin/env bash
#
# Run parallel commands and fail if any of them fails.
#
# The expected output is something like this:
#
# $ ./parallel-explained
# ls: cannot access '/not-a-file': No such file or directory
# ls: cannot access '/not-a-file'ls: cannot access '/not-a-file': No such file or directory
# : No such file or directory
#
# Our 'parallel-explained' script exited with code '2', because it's the exit
# code of one of the failed 'ls' jobs:
#
# $ echo $?
# 2
#
# 'set -e' tells the shell to exit if any of the foreground command fails,
# i.e. exits with a non-zero status.
set -eu
# Initialize array of PIDs for the background jobs that we're about to launch.
pids=()
for x in 1 2 3; do
# Run a command in the background. We expect this command to fail.
ls /not-a-file &
# Add the PID of this background job to the array.
pids+=($!)
done
# Wait for each specific process to terminate.
# Instead of this loop, a single call to 'wait' would wait for all the jobs
# to terminate, but it would not give us their exit status.
#
for pid in "${pids[@]}"; do
#
# Waiting on a specific PID makes the wait command return with the exit
# status of that process. Because of the 'set -e' setting, any exit status
# other than zero causes the current shell to terminate with that exit
# status as well.
#
wait "$pid"
done
@ASRagab
Copy link

ASRagab commented Mar 4, 2020

Pairing the tl;dr version with a longer line-by-line clear explanation is clutch, thank you!

@dudicoco
Copy link

dudicoco commented Apr 5, 2021

This did not work for me with the function which I was using, the wait command just hung.

I've added the following in order to fix it, I suggest adding it to the gist:

exit_code=0

for pid in "${pids[@]}"; do
  wait "$pid" || exit_code=1
done

if [ "$exit_code" == "1" ]; then
  exit 1
fi

@mjambon
Copy link
Author

mjambon commented Apr 5, 2021

@dudicoco I don't understand what the problem was and why it would be fixed by having wait "$pid" || exit_code=1 instead of just wait "$pid". The exit status of wait is the same as the process being waited on. If the process being waited on is stuck in an infinite loop or something, then wait gets stuck too. Could you provide a full script that exhibits the problem you were describing?

The version of bash could be useful too, in case there's an oddity.

@mjambon
Copy link
Author

mjambon commented Apr 5, 2021

Everyone, note I just changed the hashbang line in the script to #! /usr/bin/env bash so it works for MacOS users who installed a more recent version of bash with homebrew but still have the old bash 3.x at /bin/bash.

@Brikman
Copy link

Brikman commented Aug 14, 2021

We have two blocking background processes. The first one works well. The second one fails.

But wait will infinitely wait only for the first process (which is still working fine), knowing nothing about the second one which has been failed.

set -eu

pids=()

tail -f /var/log/syslog &>/dev/null &
pids+=($!)

tail -f /nonexistent.log &>/dev/null &
pids+=($!)

for pid in "${pids[@]}"; do
  wait "$pid"
done

@mjambon
Copy link
Author

mjambon commented Aug 15, 2021

[I deleted an earlier reply which was wrong]

But wait will infinitely wait only for the first process (which is still working fine), knowing nothing about the second one which has been failed.

Yes, indeed the solution here has this problem. I tried a few alternatives and they're not obvious. Here's one solution for exiting as early as possible as soon as a child finishes with an error status:

#! /usr/bin/env bash
set -eu

# Declare a numeric variable for counting the children
declare -i n=0

(sleep 3; echo ok3) &
n+=1

(sleep 2; echo fail2; exit 1) &
n+=1

(sleep 1; echo ok1) &
n+=1

while [[ "$n" -gt 0 ]]; do
  echo waiting
  # Wait for any child to finish, returning its exit status,
  # and exiting the script if the status is nonzero (due to 'set -e'),
  # leaving some child processes running.
  wait -n
  n=n-1
done

If we run it, we see that the first job that sleeps 3 seconds keeps running after the parent script terminates. I get this output:

$ ./parallel3
waiting
ok1
waiting
fail2
$ ok3

To fix this, we'd have to kill the remaining children before exiting.

@mjambon
Copy link
Author

mjambon commented Aug 15, 2021

Here's an improved version, which tries to terminate the remaining children before exiting:

#! /usr/bin/env bash
#
# Run parallel commands and fail if any of them fails.
#
set -eu

pids=()

(sleep 3; echo ok3) &
pids+=($!)

(sleep 2; echo fail2; exit 1) &
pids+=($!)

(sleep 1; echo ok1) &
pids+=($!)

for pid in "${pids[@]}"; do
  if wait -n; then
    :
  else
    status=$?
    echo "One of the subprocesses exited with nonzero status $status. Aborting."
    for pid in "${pids[@]}"; do
      # Send a termination signal to all the children, and ignore errors
      # due to children that no longer exist.
      kill "$pid" 2> /dev/null || :
    done
    exit "$status"
  fi
done

It's a little complicated and maybe incorrect in some respects.

@Manouchehri
Copy link

Here's my solution:

#!/usr/bin/env bash

set -eu

ARG1=${1:-$(nproc --ignore=1)}

pids=()

for x in $(seq 1 ${ARG1}); do
  python3 unit_tests.py &
  pids+=($!)
done

for pid in "${pids[@]}"; do
  if wait -n; then
    :
  else
    exit_code=$?
    echo "Process exited with $exit_code, killing other tests now."
    for pid in "${pids[@]}"; do
      kill -9 "$pid" 2> /dev/null || :
    done
    exit "$exit_code"
  fi
done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment