Skip to content

Instantly share code, notes, and snippets.

@rbf
Last active March 28, 2024 08:04
Show Gist options
  • Save rbf/6064734 to your computer and use it in GitHub Desktop.
Save rbf/6064734 to your computer and use it in GitHub Desktop.
Script to recursively generate PDF files from markdown files in the current directory and its subfolders using "pandoc".

Description

Script to generate PDF files from markdown files using pandoc, which is assumed installed. By default it processes all markdown files in the current directory and its subfolders and places the resulting PDF files in a ./target directory, maintaining their original subfolder structure.

Note: The script aims also to be one example of bash scripting, so that I can use it as reference for several code snippets

Execute script

To install an executable copy of the script into your /usr/local/bin directory, you can execute following command:

bash <(curl -sSL https://gist.github.com/rbf/6064734/raw/generate-pdf) --install

If you just want to execute it directly, without modifying it or saving a local copy, you can execute following command:

bash <(curl -sSL https://gist.github.com/rbf/6064734/raw/generate-pdf)

The previous command accepts also all the options and parameters.

Finally you can download the script if you want to modify it, and make it executable. You can do it with following shell line:

curl -LO https://gist.github.com/rbf/6064734/raw/generate-pdf && chmod +x ./generate-pdf

Usage

The command without any options or parameters will already do a sensible thing. You can directly to try:

  $ generate-pdf

To learn available options and parameters you can read the help text or the script itself:

  $ generate-pdf --help

The main usage pattern is the following:

  $ generate-pdf [option] [<path-pattern> [<target-path>]]

Examples

  • $ generate-pdf All .md and .markdown files in the current directory and its subdirectories are converted in pdf in the ./target directory, maintaining its original sub-folder structure

  • $ generate-pdf api All .md and .markdown files in the current directory and its subdirectories containing "api" (case sensitive) in the file name or the path are converted in pdf in the ./target directory, maintaining its original sub-folder structure

  • $ generate-pdf --dry-run api Same as above, but instead of really generating the PDF files, it lists which files would be processed with this options.

  • $ generate-pdf api pdfs All .md and .markdown files in the current directory and its subdirectories containing "api" (case sensitive) in the file name or the path are converted in pdf in the ./pdfs directory, maintaining its original sub-folder structure

#!/bin/bash
# The MIT License (MIT)
#
# Copyright (c) 2013 https://gist.github.com/rbf
#
# Permission is hereby granted, free of charge, to any person obtaining a copy of
# this software and associated documentation files (the "Software"), to deal in
# the Software without restriction, including without limitation the rights to
# use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
# the Software, and to permit persons to whom the Software is furnished to do so,
# subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
# FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
# COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
# IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
### File info:
GP_VERSION="v1.3"
GP_SOURCE="https://gist.github.com/rbf/6064734/raw/generate-pdf"
GP_INSTALL_PATH="/usr/local/bin"
###
echoerr() { echo "$@" 1>&2; }
checkflags(){
GP_CHECKING_FLAGS="true"
for flag in "${@}"
do
if parseflag "${flag}"
then
[ "${flag:0:1}" == "-" ] && GP_FOUND_FLAGS+=("${flag}")
else
GP_INVALID_FLAG_FOUND="true"
echo "[WARN] Invalid option: ${flag}"
fi
[ $GP_PARAM_FOUND ] && [ "${flag:0:1}" == "-" ] && GP_OPTION_FOUND_AFTER_PARAM="true"
done
if [ "${GP_INVALID_FLAG_FOUND}" != "" ] || [ $GP_OPTION_FOUND_AFTER_PARAM ]
then
[ $GP_OPTION_FOUND_AFTER_PARAM ] && echo "[WARN] Options are expected at the begin of the parameter list."
GP_COMMAND_NAME="${0#${GP_INSTALL_PATH}}"
echo "[INFO] Type '${GP_COMMAND_NAME} --help' for more info."
exit 0
fi
unset GP_CHECKING_FLAGS
}
parseflag(){
if [ "${1}" == "--help" ] || [ "${1}" == "-h" ]
then
[ $GP_CHECKING_FLAGS ] && return 0
printhelp
exit 0
elif [ "${1}" == "--version" ] || [ "${1}" == "-v" ]
then
[ $GP_CHECKING_FLAGS ] && return 0
printversion
exit 0
elif [ "${1}" == "--install" ]
then
[ $GP_CHECKING_FLAGS ] && return 0
installscript
exit 0
elif [ "${1}" == "--update" ]
then
[ $GP_CHECKING_FLAGS ] && return 0
updatescript
exit 0
elif [ "${1}" == "--uninstall" ]
then
[ $GP_CHECKING_FLAGS ] && return 0
uninstallscript
exit 0
elif [ "${1}" == "--dry-run" ] || [ "${1}" == "-n" ]
then
[ $GP_CHECKING_FLAGS ] && return 0
GP_DRYRUN="true"
shift
elif [ "${1}" == "--flat" ] || [ "${1}" == "-f" ]
then
[ $GP_CHECKING_FLAGS ] && return 0
GP_FLAT="true"
shift
elif [ "${1}" == "--path-name" ] || [ "${1}" == "-F" ]
then
[ $GP_CHECKING_FLAGS ] && return 0
GP_FLAT="true"
GP_PATH_IN_TARGET_FILENAME="true"
shift
elif [ "${1}" == "--bulk" ] || [ "${1}" == "-b" ]
then
[ $GP_CHECKING_FLAGS ] && return 0
GP_BULK="true"
shift
elif [ "${1}" == "--bulk-force" ] || [ "${1}" == "-B" ]
then
[ $GP_CHECKING_FLAGS ] && return 0
GP_BULK="force"
shift
elif [ "${1:0:1}" == "-" ]
then
[ $GP_CHECKING_FLAGS ] && return 1
fi
[ $GP_CHECKING_FLAGS ] && GP_PARAM_FOUND="true" # Param as value passed to the script without "--". Only two are expected, and right to the end of the param list.
}
declinepathpartsforfile(){
GP_CURRENT_DIR="$(dirname "${1}")"
GP_CURRENT_FILENAME="$(basename "${1}")"
GP_TARGET_CURRENT_DIR="${GP_TARGET_DIR}"
! [ $GP_FLAT ] && GP_TARGET_CURRENT_DIR="${GP_TARGET_CURRENT_DIR}/${GP_CURRENT_DIR}"
GP_TARGET_CURRENT_FILENAME="$(basename "${1}" ".md").pdf"
[ $GP_PATH_IN_TARGET_FILENAME ] && GP_TARGET_CURRENT_FILENAME="$(echo "${GP_CURRENT_DIR#.}" | tr '/' '-')-$(basename "${1}" ".md").pdf"
GP_TARGET_CURRENT_FILEPATH="${GP_TARGET_CURRENT_DIR}/${GP_TARGET_CURRENT_FILENAME#-}"
# Remove unnecessary (but not wrong) "/./" from path for enhanced prompt in stdout
GP_TARGET_CURRENT_FILEPATH="${GP_TARGET_CURRENT_FILEPATH/\/.\///}"
GP_TARGET_CURRENT_FILEPATH="./${GP_TARGET_CURRENT_FILEPATH#./}"
}
dodryrun(){
for file in "${GP_MD_FILES[@]}"
do
declinepathpartsforfile "${file}"
echo "[INFO] ${file} --> ${GP_TARGET_CURRENT_FILEPATH}"
done
}
checkpandoc(){
if [ "$(which pandoc)" == "" ]
then
echoerr "[ERROR] pandoc not found!"
echoerr "[INFO] For installing info visit: http://johnmacfarlane.net/pandoc/installing.html"
exit 1
fi
}
printversion(){
echo "$(basename ${0}) ${GP_VERSION}"
checkpandoc
echo "$(pandoc --version | head -1)"
}
printhelp(){
less << EOF
Version
=======
$(printversion)
Usage
=====
$ generate-pdf [option] [<path-pattern> [<target-path>]]
Options
=======
-h, --help Print this help and exit. Other parameters and options are ignored.
-n, --dry-run Only list what files would be generated with the given parameters
without actually generating any PDF files.
-f, --flat Gather all PDF generated files into the target directory, without
replicating the original subfolder structure.
-F, --path-name Include the original path of the Markdown files in the generated PDF
file, to avoid name clashes when gathering all PDF files in the target
directory. Implies '--flat'.
-b, --bulk Allow to process more than 10 files. This is a security to avoid
triggering this command by error on a directory with lots of subdirectories
like '/' or '~'.
-v, --version Print the version of the script and pandoc and exit. Other parameters
and options are ignored.
--install Make a copy of the script to '/usr/local/bin' and exit. Other parameters
and options are ignored.
--uninstall Delete the script from '/usr/local/bin' and exit. Other parameters
and options are ignored.
--update Update the script installed in '/usr/local/bin' and exit. Other parameters
and options are ignored.
Examples
========
$ generate-pdf All .md and .markdown files in the current directory and its
subdirectories are converted in pdf in the ./target directory,
maintaining its original sub-folder structure
$ generate-pdf api All .md and .markdown files in the current directory and its
subdirectories containing "api" (case sensitive) in the file name or
the path are converted in pdf in the ./target directory, maintaining
its original sub-folder structure
$ generate-pdf ./doc/*api All .md and .markdown files in the "doc/" subdirectory and its
subdirectories containing "api" (case sensitive) in the file name or
the path are converted in pdf in the ./target directory, maintaining
its original sub-folder structure
$ generate-pdf api pdfs All .md and .markdown files in the current directory and its
subdirectories containing "api" (case sensitive) in the file name or
the path are converted in pdf in the ./pdfs directory, maintaining
its original sub-folder structure
$ generate-pdf api . All .md and .markdown files in the current directory and its
subdirectories containing "api" (case sensitive) in the file name or
the path are converted in pdf in the current directory, maintaining
its original sub-folder structure, i.e. next to their .md counterparts
$ generate-pdf . . All .md and .markdown files in the current directory and its
subdirectories are converted in pdf in the current directory,
maintaining its original sub-folder structure, i.e. next to their .md
counterparts
EOF
}
GP_CURL_BASE="curl -sSL ${GP_SOURCE}"
GP_CURL_INSTALL="bash <(${GP_CURL_BASE}) --install"
GP_CURL_VERSION="bash <(${GP_CURL_BASE}) --version"
GP_INSTALLED_SCRIPT="${GP_INSTALL_PATH}/generate-pdf"
updatescript(){
if [ ! -e "${GP_INSTALLED_SCRIPT}" ]
then
echoerr "[ERROR] ${GP_INSTALLED_SCRIPT} not found"
echoerr "[INFO] Run following command to install: ${GP_CURL_INSTALL}"
exit 1
fi
if [ "${GP_VERSION}" == "$(eval "${GP_CURL_VERSION}" | head -1 | sed "s/.*\(v.*\)/\1/")" ]
then
echo "Already up-to-date."
exit 0
fi
eval ${GP_CURL_INSTALL}
}
installscript(){
if [ "${0}" == "${GP_INSTALLED_SCRIPT}" ]
then
echo "Already using ${0}"
exit 0
fi
checkpandoc
if [ -e "${GP_INSTALLED_SCRIPT}" ]
then
echo -n "Currently installed: ${GP_INSTALL_PATH}/"
"${GP_INSTALLED_SCRIPT}" --version | head -1
echo "You are installing version ${GP_VERSION}."
read -p "Do you want to continue? [y/n] " GP_ANSWER
if [ "${GP_ANSWER:0:1}" == "n" ]
then
exit 0
fi
fi
if [ "$(dirname ${0})" == "/dev/fd" ]
then
# We are installing from the piped version, and we have to redownload a new version of the file
# since the current one (often in /dev/fd/63) has already been consumed up to here
${GP_CURL_BASE} -o "${GP_INSTALLED_SCRIPT}"
else
cp -v "${0}" "${GP_INSTALLED_SCRIPT}"
fi
chmod +x "${GP_INSTALLED_SCRIPT}"
echo "[INFO] Successfully installed to ${GP_INSTALLED_SCRIPT}"
}
uninstallscript(){
if [ ! -e "${GP_INSTALLED_SCRIPT}" ]
then
echo "[WARN] ${GP_INSTALLED_SCRIPT} not found"
exit 0
fi
echo "[INFO] This action will remove ${GP_INSTALLED_SCRIPT}"
read -p "Do you want to continue? [y/n] " GP_ANSWER
if [ "${GP_ANSWER:0:1}" == "n" ]
then
exit 0
fi
rm -v ${GP_INSTALLED_SCRIPT}
if [ "$?" == 0 ]
then
echo "[INFO] ${GP_INSTALLED_SCRIPT} removed successfully"
else
echoerr "[ERROR] Removing ${GP_INSTALLED_SCRIPT} failed"
exit 1
fi
}
checkpandoc
# Expand one-letter params
GP_PARAMS="${@}"
while [ "true" ]
do
GP_PARAMS_PARSED=$(echo "params: ${GP_PARAMS}" | sed "s/\( -[a-zA-Z]\)\([a-zA-Z]\)/\1 -\2/g")
if [ "${GP_PARAMS}" == "${GP_PARAMS_PARSED#params: }" ]
then
break
else
GP_PARAMS="${GP_PARAMS_PARSED#params: }"
fi
done
set -- ${GP_PARAMS}
checkflags ${@}
for flag in "${@}"
do
parseflag "${flag}"
done
for flag in "${GP_FOUND_FLAGS[@]}"
do
shift
done
while read -r line
do
GP_MD_FILES+=("${line}")
done <<< "$(find . -type f -path "*${1}*" \( -iname "*.md" -or -iname "*.markdown" \) -print)"
# GP_MD_FILES=($(find . -type f -path "*${1}*" \( -iname "*.md" -or -iname "*.markdown" \) -exec echo {} | tr ' ' '\ ' \;))
GP_MD_FILES_COUNT=${#GP_MD_FILES[@]}
GP_MD_FILES_MAX=10
GP_BASE_DIR="$(pwd)"
GP_TARGET_DIR="${2:-target}"
if [ "${GP_DRYRUN}" == "true" ]
then
echo "[INFO] Would generate PDFs from ${GP_MD_FILES_COUNT} Markdown file(s)"
dodryrun
exit
fi
if [ ${GP_MD_FILES_COUNT} -gt ${GP_MD_FILES_MAX} ]
then
GP_COMMAND_NAME="${0#${GP_INSTALL_PATH}}"
GP_ARGUMENTS="${1} ${2}"
GP_ARGUMENTS="$(echo " ${1} ${2}" | sed 's/ *$//g')"
echo "[WARN] ${GP_MD_FILES_COUNT} Markdown files found! That sounds like a lot..."
echo "[INFO] Use '${GP_COMMAND_NAME} --dry-run${GP_ARGUMENTS}' to see a list of what would be generated."
if [ "${GP_BULK}" == "true" ]
then
echo "[INFO] Use '${GP_COMMAND_NAME} --bulk-force${GP_ARGUMENTS}' to avoid this question and directly proceed with the PDF generation."
read -p " > Do you want to continue? [y/n] " GP_ANSWER
if [ "${GP_ANSWER:0:1}" != "y" ]
then
exit 0
fi
elif [ "${GP_BULK}" == "" ]
then
echo "[INFO] Use '${GP_COMMAND_NAME} --bulk${GP_ARGUMENTS}' to proceed with the PDF generation."
exit 0
fi
fi
for file in "${GP_MD_FILES[@]}"
do
echo -n "[INFO] Processing file: $file "
declinepathpartsforfile "${file}"
mkdir -p "${GP_TARGET_CURRENT_DIR}"
# Since pandoc looks for images relatives to where it is called (https://github.com/jgm/pandoc/issues/917)
# we have to cd into the directory of the given file before pandoc'ing it,
# and we have to pass the full path in the -o parameter.
cd "${GP_CURRENT_DIR}"
GP_PANDOC_OUTPUT=$(pandoc --toc --number-sections "${GP_CURRENT_FILENAME}" -o "${GP_BASE_DIR}/${GP_TARGET_CURRENT_FILEPATH#./}" 2>&1) # to capture pandoc stderr output
if [ "$?" == 0 ]
then
echo "--> ${GP_TARGET_CURRENT_FILEPATH}"
else
echo -e "\r[ERROR] Processing file: $file - Detail below"
GP_ERRORS+=("$(echo -e "[ERROR] ${file} - Failed generation of PDF\n[ERROR] Expected target file: ${GP_TARGET_CURRENT_FILEPATH}\n[ERROR] ${GP_PANDOC_OUTPUT}\n\n\n")")
fi
cd "${GP_BASE_DIR}"
done
if [ ${#GP_ERRORS[@]} -eq 0 ]
then
exit 0
fi
echoerr
echoerr "[ERROR] ===================================================================="
echoerr "[ERROR] pandoc failed to generate PDF for ${#GP_ERRORS[@]} file(s)"
echoerr "[ERROR] ===================================================================="
for i in "${GP_ERRORS[@]}"
do
echoerr
echoerr "${i}"
echoerr
done
exit 1
  • Add check to avoid parsing the whole computer for markdown files! For example:
    • Add --test mode to list only the files that it would process
    • Warn if more than say 10 files would be processed
    • Limit the depth for recursitivy to default to say 3 directory levels, and maybe a flagg --levels=4 to modify it...
  • Allow a plain structure in the target folder, instead of replicating the original subfolder structure of the markdown files.
  • Make small script to update version and tag commit
@inwardmovement
Copy link

Thats's sounds awesome... Do you know how to make a similar thing on Windows?

@howardjones
Copy link

howardjones commented Dec 17, 2020

I like the idea but...

/home/howie/bin/process-md: line 188: syntax error near unexpected token `newline'
/home/howie/bin/process-md: line 188: `  done'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment