Skip to content

Instantly share code, notes, and snippets.

@peci1
Last active January 4, 2021 18:07
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save peci1/67bc29310fd4208312222c2de97ba0eb to your computer and use it in GitHub Desktop.
Save peci1/67bc29310fd4208312222c2de97ba0eb to your computer and use it in GitHub Desktop.
Anonymize a PDF document with comments so that both document metadata and comment author data do not disclose who the author is.
qpdf --qdf --object-streams=disable $1 $1.tmp # uncompress /FlateDecode sections
perl -pe 's/(?<=\/T \()(.*?)(?=\))/ "x" x length($1) /e' $1.tmp > $1.tmp2 # remove /T commands containing author name in comments
qpdf --compress-streams=y $1.tmp2 $1.tmp3 # recompress streams
qpdf --empty --pages $1.tmp3 1-z -- $1.anonymous.pdf # remove document metadata (this also removes whole comments)
rm $1.tmp* # cleanup
@ryancoe
Copy link

ryancoe commented Apr 9, 2020

Super helpful, thanks!

@peci1
Copy link
Author

peci1 commented May 12, 2020

You can also want to search the document for instances of your name. It doesn't have to be encoded in ASCII only. Here's an example of finding a hex version of your name in the document. You can then use sed to replace all letters in your name with a character, e.g. x.

grep $(echo -n "Martin Pecka" | xxd -ps) $1.tmp

Or a 00-prefixed hex string:

grep $(echo -n "Martin Pecka" | xxd -ps | sed -r 's/.{2}/00&/g') $1.tmp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment