So here's the story: a friend of mine wanted to share on the web some notes he wrote in OneNote. The PDF was 155 pages and weighted 270MB. This was exceeding the 100MB max file size limit of the service we were using.
After doing some research on the internet, i discovered this was pretty common for OneNote files, in particular if you have active the pressure sensitivity. Searching the internet did not produce any result, so after a day of research i finally got a method working that reduce the 278MB file into a 24MB file, with no significant loss.
The key to this process is the rasterization, where we transform the nicely-indefinitely-zoomable vector traces into a picture of the page, and put the picture in the PDF page instead of the vector notes. You loose the flexibility of having a vector based PDF, but you can compress a PDF much more.
This procedure is done with ImageMagick, a cross-platform set of tools to manipulate images. The downside is that it is a command line interface tool, the upside is that you just have to copy and paste things from this guide. ImageMagick depends on GhostScript to perform this task, so you will have to install both.
Go to ImageMagick download page, download the
latest version (at the time of writing it's 7.0.11-13
) and install.
Then go to the GhostScript donwload page - you probably want to download the AGPL version if you're not integrating GhostScript into an application.
Just install ImageMagick and GhostScript from your favourite package manager, i use apt so i used:
sudo apt install imagemagick ghostscript
Should also work on mac, but have not tested that.
ImageMagick out of the box refuses to work on PDF files and has strict resource limits. If you work on large files like me, you probably want to ease those restrictions.
First of all, locate your policy.xml
file. In Windows you normally have it under C:\Program Files\ImageMagick-[version]\policy.xml
,
on linux i have it located at /etc/ImageMagick-[version]/policy.xml
. Open this file with your favourite editor (e.g. notepad for windows),
scroll to the bottom and ABOVE the line </policymap>
add these lines:
<policy domain="resource" name="memory" value="4GiB"/> <!-- increase ram availabiliy -->
<policy domain="resource" name="map" value="4GiB"/>
<policy domain="resource" name="disk" value="8GiB"/> <!-- increase disk availabiliy -->
<policy domain="coder" rights="read|write" pattern="PDF" /> <!-- enable PDFs -->
Everything is set up. Now it's just one command. Open your command line (CMD.exe or your terminal)
If you use ImageMagick 6:
convert -density 150 -compress zip input.pdf output.pdf
If you use ImageMagick 7:
magick convert -density 150 -compress zip input.pdf output.pdf
As you can see, the difference between ImageMagick 6 and 7 is that with the newer version you have to prepend magick
before the command.
In this example we are using a density of 150 DPI, but you can increase or decrease this number according to your needs. I find that 150 DPI it's a good tradeoff between size and resolution.
Works well, thank you. For JPEG compression put
-compress jpeg -quality 50
instead of-compress zip
.