Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Strip PDF Metadata
# --------------------------------------------------------------------
# Recursively find pdfs from the directory given as the first argument,
# otherwise search the current directory.
# Use exiftool and qpdf (both must be installed and locatable on $PATH)
# to strip all top-level metadata from PDFs.
#
# Note - This only removes file-level metadata, not any metadata
# in embedded images, etc.
#
# Code is provided as-is, I take no responsibility for its use,
# and I make no guarantee that this code works
# or makes your PDFs "safe," whatever that means to you.
#
# You may need to enable execution of this script before using,
# eg. chmod +x clean_pdf.sh
#
# example:
# clean current directory:
# >>> ./clean_pdf.sh
#
# clean specific directory:
# >>> ./clean_pdf.sh some/other/directory
# --------------------------------------------------------------------
# Color Codes so that warnings/errors stick out
GREEN="\e[32m"
RED="\e[31m"
CLEAR="\e[0m"
# loop through all PDFs in first argument ($1),
# or use '.' (this directory) if not given
DIR="${1:-.}"
echo "Cleaning PDFs in directory $DIR"
# use find to locate files, pip to while read to get the
# whole line instead of space delimited
# Note -- this will find pdfs recursively!!
find $DIR -type f -name "*.pdf" | while read -r i
do
# output file as original filename with suffix _clean.pdf
TMP=${i%.*}_clean.pdf
# remove the temporary file if it already exists
if [ -f "$TMP" ]; then
rm "$TMP";
fi
exiftool -q -q -all:all= "$i" -o "$TMP"
qpdf --linearize --replace-input "$TMP"
echo -e $(printf "${GREEN}Processed ${RED}${i} ${CLEAR}as ${GREEN}${TMP}${CLEAR}")
done
@muddynat
Copy link

muddynat commented Jan 26, 2022

How would one change this to replace the existing file, rather than creating a new one with the _clean.pdf suffix?

@RooneyMcNibNug
Copy link

RooneyMcNibNug commented Jan 26, 2022

@muddynat you could probably just do something like the following one-liner for this:

for f in ./*.pdf; do exiftool -q -q -all:all= "$i" && qpdf --linearize --replace-input; done

@sneakers-the-rat
Copy link
Author

sneakers-the-rat commented Jan 26, 2022

that^^ would work, just need to add "$i" to the qpdf part, i believe. most of this script is just to add comments and tell the person running it what's going on. I have never gotten the hang of writing arguments for shell scripts, but it would be nice to have a --suffix flag (that you could just give "").

@muddynat
Copy link

muddynat commented Jan 27, 2022

@RooneyMcNibNug & @sneakers-the-rat thanks! I don't know much about bash scripting - where would this "$i" go in the qpdf part?

@sneakers-the-rat
Copy link
Author

sneakers-the-rat commented Jan 29, 2022

@muddynat that's a string replacement, so you're substituting "$i" for the value of what you are iterating over in for or while . taking a second look at the code in the above comment it also needs its variable renamed and to use the while pattern, so it would be like this:

find $DIR -type f -name "*.pdf" | while read -r i
do
  exiftool -q -q -all:all= "$i"
  qpdf --linearize --replace-input "$i"
done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment