Skip to content

Instantly share code, notes, and snippets.

@valerionew
Last active January 18, 2024 14:40
Show Gist options
  • Save valerionew/43b349095b9cce7428a4538772ba9943 to your computer and use it in GitHub Desktop.
Save valerionew/43b349095b9cce7428a4538772ba9943 to your computer and use it in GitHub Desktop.
Batch compress onenote handwritten PDF into raster PDF

So here's the story: a friend of mine wanted to share on the web some notes he wrote in OneNote. The PDF was 155 pages and weighted 270MB. This was exceeding the 100MB max file size limit of the service we were using.

After doing some research on the internet, i discovered this was pretty common for OneNote files, in particular if you have active the pressure sensitivity. Searching the internet did not produce any result, so after a day of research i finally got a method working that reduce the 278MB file into a 24MB file, with no significant loss.

The key to this process is the rasterization, where we transform the nicely-indefinitely-zoomable vector traces into a picture of the page, and put the picture in the PDF page instead of the vector notes. You loose the flexibility of having a vector based PDF, but you can compress a PDF much more.

This procedure is done with ImageMagick, a cross-platform set of tools to manipulate images. The downside is that it is a command line interface tool, the upside is that you just have to copy and paste things from this guide. ImageMagick depends on GhostScript to perform this task, so you will have to install both.

Install

Windows

Go to ImageMagick download page, download the latest version (at the time of writing it's 7.0.11-13) and install.

Then go to the GhostScript donwload page - you probably want to download the AGPL version if you're not integrating GhostScript into an application.

Linux

Just install ImageMagick and GhostScript from your favourite package manager, i use apt so i used:

sudo apt install imagemagick ghostscript

Mac

Should also work on mac, but have not tested that.

Configuration

ImageMagick out of the box refuses to work on PDF files and has strict resource limits. If you work on large files like me, you probably want to ease those restrictions.

First of all, locate your policy.xml file. In Windows you normally have it under C:\Program Files\ImageMagick-[version]\policy.xml, on linux i have it located at /etc/ImageMagick-[version]/policy.xml. Open this file with your favourite editor (e.g. notepad for windows), scroll to the bottom and ABOVE the line </policymap> add these lines:

  <policy domain="resource" name="memory" value="4GiB"/> <!-- increase ram availabiliy -->
  <policy domain="resource" name="map" value="4GiB"/>
  <policy domain="resource" name="disk" value="8GiB"/> <!-- increase disk availabiliy -->
  <policy domain="coder" rights="read|write" pattern="PDF" /> <!-- enable PDFs -->

Usage

Everything is set up. Now it's just one command. Open your command line (CMD.exe or your terminal)

If you use ImageMagick 6:

convert -density 150 -compress zip input.pdf output.pdf

If you use ImageMagick 7:

magick convert -density 150 -compress zip  input.pdf output.pdf

As you can see, the difference between ImageMagick 6 and 7 is that with the newer version you have to prepend magick before the command.

In this example we are using a density of 150 DPI, but you can increase or decrease this number according to your needs. I find that 150 DPI it's a good tradeoff between size and resolution.

for f in ./*.pdf; do
convert -density 150 -compress zip "${f%.*}".pdf "${f%.*}"-raster.pdf
done
@Trolobezka
Copy link

Works well, thank you. For JPEG compression put -compress jpeg -quality 50 instead of -compress zip.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment