Skip to content

Instantly share code, notes, and snippets.

@sahal
Created March 2, 2024 17:14
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sahal/89484e3ee97f010537dc7f279fdeb86f to your computer and use it in GitHub Desktop.
Save sahal/89484e3ee97f010537dc7f279fdeb86f to your computer and use it in GitHub Desktop.
Split PDF using Okular, pdfimages, ImageMagick, pdfchain to resize, grayscale, and combine pdfs

Okular

Multi-platform, fast and packed with features, Okular allows you to read PDF documents, comics and EPub books, browse images, visualize Markdown documents, and much more.

Okular

Redact

Use Okular (or similar) to add annotations that cover up the parts of the PDF that you want to redact (i.e. passwords, socials, etc)

Rasterize

Use Okular (or similar) to print as PDF, making sure to select "Force rasterization" (or similar) before exporting as PDF.

This will produce a PDF without vector images -- instead all form fields/text will be "rasterized" or turned into an image.

pdfimages

Pdfimages saves images from a Portable Document Format (PDF) file as Portable Pixmap (PPM), Portable Graymap (PGM), Portable Bitmap (PBM), or JPEG files.

pdfimages

extract images from rasterized pdf

$ mkdir images
$ pdfimages -tiff file.pdf images/

Edit Images

Do some more redaction, remove information, etc using your favorite image editor. I used GIMP

Imagemagick

ImageMagick® is a free, open-source software suite, used for editing and manipulating digital images. It can be used to create, edit, compose, or convert bitmap images, and supports a wide range of file formats, including JPEG, PNG, GIF, TIFF, and PDF.

ImageMagick

Resize images

$ convert input.tif -intensity Rec709luminance -colorspace gray
-resize 50% -quality 80 output.jpg

Combine jpgs into a single pdf

$ convert $(find . -type f -name "*.jpg"|sort) output.pdf

pdfchain

PDF Chain is a graphical user interface for the PDF Toolkit (PDFtk). The GUI supports all common features of the command line tool in a comfortable way.

PDF Chain

Alternatively, for people that don't use Linux, use pdfsam

Burst

Use this tab to split the pdf into multiple pdfs (one per page)

Catenate

Use this tab to concatonate (combine) multiple pdfs into a single pdf. This is great to include pages with titles to your rasterized PDF.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment