Skip to content

Instantly share code, notes, and snippets.

@julianthome
Last active October 5, 2016 08:40
Show Gist options
  • Save julianthome/161e6734c36611fcf03c91c9f76ebd5a to your computer and use it in GitHub Desktop.
Save julianthome/161e6734c36611fcf03c91c9f76ebd5a to your computer and use it in GitHub Desktop.
#!/bin/bash
# make variables available in function started by
# gnu parallel
export FINALRES="result"
export WPSIZE=5
export JOBLOG="joblog"
function getfiles() {
find ./* -name "*.txt"
}
function worker() {
[ ${#@} -eq 0 ] && exit 0
for fil in "${@}"; do
echo "process $fil"
done
exit 0
}
# make function worker known to gnu parallel
export -f worker
if [ ! -e "${JOBLOG}" ]; then
getfiles | parallel --joblog "${JOBLOG}" -n"${WPSIZE}" -j +0 worker >> "${FINALRES}"
else
getfiles | parallel --resume --joblog "${JOBLOG}" -n"${WPSIZE}" -j +0 worker >> "${FINALRES}"
fi
echo "all jobs are finished"
exit 0
@ole-tange
Copy link

Do you need WPSIZE? By removing that and by running a single job at a time, you get the added benefit that GNU Parallel will log if any of the jobs failed.

function worker() {
echo "process $1"
}
export -f worker
getfiles | parallel --joblog "${JOBLOG}" -j +0 worker >> "${FINALRES}"

Also there is no need to touch $FINALRES: The >> will create the file if it is not there.

@julianthome
Copy link
Author

Hello Ole,

thank for your feedback. I am using the WPSIZE variable to control the number of params that are passed to the workers. My intention was to prevent too much locking on the output file (which I assume does gnu parallel internally) when the workers are finishing quickly. In my case that happened quite often, so I extended the lifetime of every worker a bit. The touch $FINALRES part is indeed redundant.

Thank again and kind regards
Julian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment