Last active
January 4, 2021 18:07
-
-
Save peci1/67bc29310fd4208312222c2de97ba0eb to your computer and use it in GitHub Desktop.
Anonymize a PDF document with comments so that both document metadata and comment author data do not disclose who the author is.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
qpdf --qdf --object-streams=disable $1 $1.tmp # uncompress /FlateDecode sections | |
perl -pe 's/(?<=\/T \()(.*?)(?=\))/ "x" x length($1) /e' $1.tmp > $1.tmp2 # remove /T commands containing author name in comments | |
qpdf --compress-streams=y $1.tmp2 $1.tmp3 # recompress streams | |
qpdf --empty --pages $1.tmp3 1-z -- $1.anonymous.pdf # remove document metadata (this also removes whole comments) | |
rm $1.tmp* # cleanup |
You can also want to search the document for instances of your name. It doesn't have to be encoded in ASCII only. Here's an example of finding a hex version of your name in the document. You can then use sed to replace all letters in your name with a character, e.g. x
.
grep $(echo -n "Martin Pecka" | xxd -ps) $1.tmp
Or a 00-prefixed hex string:
grep $(echo -n "Martin Pecka" | xxd -ps | sed -r 's/.{2}/00&/g') $1.tmp
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Super helpful, thanks!