Skip to content

Instantly share code, notes, and snippets.

@programminghoch10
Last active January 23, 2024 09:48
Show Gist options
  • Save programminghoch10/7b240002e3ac645fdb01478619e7bf5c to your computer and use it in GitHub Desktop.
Save programminghoch10/7b240002e3ac645fdb01478619e7bf5c to your computer and use it in GitHub Desktop.
Simple bash script parallelization using semaphores
#!/bin/bash
SEMPATH="/run/lock"
SEMNAME=""
semtake() {
local name="$1"
[ -z "$name" ] && echo "Missing semaphore name!" && return 1
local j="$2"
[ -z "$2" ] && j=$(nproc)
[ -n "$SEMNAME" ] && echo "Already have $SEMNAME" && return 1
while true; do
for i in $(seq 1 $j); do
SEMNAME=".semlock-$name-$j-$i"
mkdir "$SEMPATH/$SEMNAME" 2>/dev/null && break 2
done
sleep 1
done
trap semgive EXIT
}
semgive() {
[ -z "$SEMNAME" ] && return
rmdir "$SEMPATH/$SEMNAME" &>/dev/null || true
SEMNAME=""
}
#!/bin/bash
[ -z "$(command -v inotifywait)" ] && echo "inotify-tools need to be installed for $0 to work!" >&2 && return 1
SEMPATH="/run/lock"
[ ! -d "$SEMPATH" ] && echo "$SEMPATH is not a valid directory" >&2 && return 1
! (return 0 2>/dev/null) && echo "$0 can only be sourced, not executed" >&2 && exit 1
#SEMNAME=""
#SEMNAMEID=""
semtake_pool() {
local SEMNAME="$1"
local j="$2"
for i in $(seq 1 "$j"); do
SEMNAMEID="$i"
mkdir "$SEMPATH/$SEMNAME-$SEMNAMEID" 2>/dev/null || continue
return 0
done
unset SEMNAMEID
return 1
}
semtake() {
local name="$1"
[ -z "$name" ] && echo "Missing semaphore name!" >&2 && return 1
local j="$2"
[ -z "$2" ] && j=$(nproc)
[ -n "$SEMNAMEID" ] && echo "Already have $SEMNAME" >&2 && return 1
SEMNAME=".semlock-$name"
until semtake_pool "$SEMNAME" "$j"; do
local i
i="$(find "$SEMPATH" -maxdepth 1 -type d -name "$SEMNAME-wait-*" 2>/dev/null | sed 's/^.*-\([[:digit:]]*\)$/\1/' | sort -n | tail -1)"
[ -z "$i" ] && i=0
local SEMWAITNAME
while true; do
SEMWAITNAME="$SEMNAME"-wait-$i
i=$((i+1))
mkdir "$SEMPATH"/"$SEMWAITNAME" &>/dev/null || continue
break
done
inotifywait --quiet --quiet --event delete_self "$SEMPATH"/"$SEMWAITNAME"
rmdir "$SEMPATH"/"$SEMWAITNAME" &>/dev/null || true
done
trap semgive EXIT
}
semgive() {
[ -z "$SEMNAME" ] && return
[ -z "$SEMNAMEID" ] && return
rmdir "$SEMPATH"/"$SEMNAME"-"$SEMNAMEID" &>/dev/null || true
unset SEMNAMEID
local i
i="$(find "$SEMPATH" -maxdepth 1 -type d -name "$SEMNAME-wait-*" 2>/dev/null | sed 's/^.*-\([[:digit:]]*\)$/\1/' | sort -n | head -1)"
[ -z "$i" ] && return
local SEMWAITNAME
local waiter
for waiter in "$SEMPATH"/"$SEMNAME"-wait-*; do
SEMWAITNAME="$SEMNAME"-wait-$i
i=$((i+1))
rmdir "$SEMPATH"/"$SEMWAITNAME" &>/dev/null || continue
break
done
unset SEMNAME
}
@programminghoch10
Copy link
Author

programminghoch10 commented Nov 1, 2022

semlock.sh

This bash script contains two functions making parallelization of bash scripts very easy.

Motivation

Many Semaphore implementations for bash (such as parallel --sem) force the user to define the executed task as arguments,
because the executed task has to be wrapped by taking the semaphore before the task and giving the semaphore back after the task.
This has the major drawback that an existing script has to be rewritten completely to fit to the semaphore interface.
I had a lot of scripts though where I have a for loop iterating over multiple files, where each iteration could be done in parallel, but multiple commands had to be executed for each file.
So I created my own implementation of semaphores which can be wrapped around an entire code block within a bash script.

Interface Specification

The script defines two methods:

  • semtake <name> [count] takes a semaphore with the name name and allows up to count processes with this semaphore at the same time. Setting count to 1 will only allow 1 process with that semaphore at the same time. Default count is the amount of available processor threads.
  • semgive returns the previously taken semaphore

semtake may only be called once per shell, semgive may only be called after semtake has been called within the same shell earlier.

semtake will set up a trap to give back the semaphores when the shell exits for you, so you don't have to call semgive explicitly.

Migration

Let's assume you have a shell script with a for loop similar to this:

for file in *; do
    touch $file
    # multiple commands editing or processing $file
done

which could be parallelized but your computer does not have the resources to process every file simulaneously, but does have multiple threads which could be used.
With semlock.sh only minimal refactoring is required for parallelization.

source semlock.sh
for file in *; do
    (
    semtake fileprocess 2
    touch $file
    # multiple commands editing or processing $file
    ) &
done
wait
  1. Include the semlock.sh functions with source semlock.sh
  2. Surround the code block to be parallized by ( and ) &
  3. Place a semtake right after (
    • Here fileprocess is this semaphores name, and the semaphore limits execution to 2 threads.
    • All threads will be instantly spawned in parallel, but execution waits at semtake for an availlable semaphore.
  4. Add wait to the end of the loop to wait for all threads to finish.

@programminghoch10
Copy link
Author

programminghoch10 commented Nov 1, 2022

semnotify.sh

Another implementation of semlock.sh.
It features the exact same usage as semlock.sh, so the instructions and documentation from semlock.sh apply.

This variant uses inotify-tools to notify the next waiting process that the semaphore is available.
This way we achive two additional points:

  1. No busy waiting required, as the processes are passively waiting on filesystem changes.
  2. Ordered execution, because the waiting line is now numbered and semaphores will be distributed "first come, first serve"

This can be used as a drop-in replacement to semlock.sh.
If you have inotify-tools installed, simply download semnotify.sh, rename it to semlock.sh and replace the other implementation.

@programminghoch10
Copy link
Author

Reserved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment