Created
May 24, 2021 17:37
-
-
Save jwalton/68fe14704e1b64a25bc3a8841802ef8f to your computer and use it in GitHub Desktop.
Convert a PDF of JPGs to JPGs, recompress them, and then back to PDF.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Launch a fresh VM | |
docker run --rm -it -v `pwd`:/mount ubuntu | |
# Extract PDF to JPGs | |
cd /mount | |
apt-get update | |
apt-get install poppler-utils | |
pdfimages -j file.pdf fileprefix | |
mkdir extract | |
mv *.jpg extract | |
# Recompress JPGs and convert to greyscale | |
apt-get install imagemagick | |
FILES=$( find extract -type f -name "*.jpg" | cut -d/ -f 2) | |
mkdir shrink && cd shrink | |
for file in $FILES; do | |
convert -strip -quality 30 -colorspace Gray ../extract/$file ./$file | |
done | |
# Back to PDF | |
# Note that your docker VM needs a lot of RAM to do this if you have a lot of images. | |
# Alternatively, if you're on a Mac, you can `brew install imagemagick` and | |
# do this last step native. | |
convert *.jpg ../final.pdf |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment