Skip to content

Instantly share code, notes, and snippets.

@Te-k
Created November 26, 2020 10:31
Show Gist options
  • Save Te-k/e504d9586377c36c75ba131c4280f5fb to your computer and use it in GitHub Desktop.
Save Te-k/e504d9586377c36c75ba131c4280f5fb to your computer and use it in GitHub Desktop.
How to remove metadata from PDFs

Many tools do not fully remove metadata, but just remove the link with in the metadata table. The data are thus still available in the PDF file itself.

While a lot of people rely on Exiftool to remove metadata, it actually does the same in PDFs. If you remove metadata with exiftool -all= some.pdf, you can always restore the data with exiftool -pdf-update:all= some.pdf.

There are several options to remove PDF metadata safely:

Option 1 : Exiftool with qpdf

  • Remove metadata with exiftool : exiftool -all= some.pdf
  • Then remove ununsed objects with qpdf : qpdf --linearize some.pdf - > some.cleaned.pdf

Option 2 : MAT

Use MAT2, a python library with a command line tool.

Option 3 : DangerZone

Uses DangerZone, that has a GUI interface for Windows, Mac OS and Linux (but is quite heavy).

(DangerZone is based on formerly pdf-redact-tools which can also be an option)

@Moon1moon
Copy link

Hi, do you know how good this tool for removing metadata?
https://github.com/szTheory/exifcleaner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment