Skip to content

Instantly share code, notes, and snippets.

@convexset
Last active December 2, 2023 19:23
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save convexset/28abc9c7be261954507ac705cbf64099 to your computer and use it in GitHub Desktop.
Save convexset/28abc9c7be261954507ac705cbf64099 to your computer and use it in GitHub Desktop.
Remove multiple text watermarks from a PDF file. Requires xxd and qpdf to work correctly.
#!/bin/bash
# Remove multiple text watermarks from a PDF file. Requires xxd and qpdf to work correctly.
#
# Usage:
#
# remove-pdf-watermark.sh "Your Input File.pdf" "Your Output File.pdf" [WATERMARK1] [WATERMARK2] [WATERMARK3] [...]
#
# For Example:
#
# remove-pdf-watermark.sh "Your Input File.pdf" "Your Output File.pdf" "Watermark 1" "Watermark 2"
#
# This is a more general (lesser dependencies, more functionality, but slower) version of
# https://gist.github.com/elfsternberg/a96883018d783cbbad7b454ecd0a7ffe
INPUT_FILENAME=$1
OUTPUT_FILENAME=$2
echo "Processing: $INPUT_FILENAME"
UNCOMPRESSED=`mktemp -t 'uncompressed'`
UNCOMPRESSED_HEX=`mktemp -t 'uncompressed-hex'`
UNMARKED_PRE=`mktemp -t 'unmarked-pre'`
qpdf --stream-data=uncompress --decode-level=all "$INPUT_FILENAME" $UNCOMPRESSED
echo " - Decompressing to: $UNCOMPRESSED"
xxd -ps -c 0 $UNCOMPRESSED > $UNCOMPRESSED_HEX
echo " - Hex dumping to: $UNCOMPRESSED_HEX"
rm $UNCOMPRESSED
for ARG in "$@"; do
COUNT=$((COUNT+1))
if [[ $COUNT -gt 2 ]]
then
WATERMARK=$ARG
echo " - Processing Watermark: \"$WATERMARK\""
WATERMARKLEN=${#WATERMARK}
WATERMARK_HEX=$(echo -n $WATERMARK | xxd -p -c 0)
BLANKS=$(printf %${WATERMARKLEN}s)
BLANKS_HEX=$(echo -n "$BLANKS" | xxd -p -c 0)
NUM_OCCURENCES_=$(grep -o $WATERMARK_HEX $UNCOMPRESSED_HEX | wc -l)
NUM_OCCURENCES=$((0+$NUM_OCCURENCES_))
echo " * Number of occurences: $NUM_OCCURENCES"
if [[ $NUM_OCCURENCES -gt 0 ]]
then
echo " * Replacing with $WATERMARKLEN blanks..."
sed -i -e "s/$WATERMARK_HEX/$BLANKS_HEX/g" $UNCOMPRESSED_HEX
echo " * Replacement done."
NUM_OCCURENCES_=$(grep -o $WATERMARK_HEX $UNCOMPRESSED_HEX | wc -l)
NUM_OCCURENCES=$((0+$NUM_OCCURENCES_))
echo " * Final number of occurences: $NUM_OCCURENCES"
fi
fi
done
xxd -r -p -c 0 $UNCOMPRESSED_HEX $UNMARKED_PRE
rm $UNCOMPRESSED_HEX
echo " - Reverting from hex dump to: $UNMARKED_PRE"
qpdf --stream-data=compress $UNMARKED_PRE "$OUTPUT_FILENAME"
rm $UNMARKED_PRE
echo " - Output written to: $OUTPUT_FILENAME"
echo
echo
# NO WARRANTY
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
# OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
# LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
# OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
# WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@jberkenbilt
Copy link

It will be interesting to see whether/how qpdf json v2 might help you with this kind of task once qpdf 11 is out. qpdf json v2 is in main now, though the interface isn't frozen until I release. If you want to take a look, you can check out current main. Documentation is at https://qpdf.readthedocs.io/en/latest/json.html

@convexset
Copy link
Author

I think that sounds really really cool. Having structured data to work with allows so much more. Also, thanks for qpdf. It has saved me so much time.

@dhewg
Copy link

dhewg commented Jan 18, 2023

Here's one json example which works for some pdfs

qpdf --json --json-stream-data=file in.pdf out.json
for i in `grep -Hi confidential out.json-* | cut -d':' -f1`; do
  cp /dev/null $i;
done
qpdf --json-input out.json out.pdf

@jberkenbilt
Copy link

@dhewg I think this is the first example I've seen of user-contributed qpdf json. :-)

@dhewg
Copy link

dhewg commented Jan 18, 2023

And such a small yet powerful one! ;) Some watermarks are just too much in-your-face...
Thanks for the json format, this makes it easy to manipulate with standard tools!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment