Skip to content

Instantly share code, notes, and snippets.

@agharbeia
Created October 13, 2017 12:46
Show Gist options
  • Save agharbeia/31f44862094f826d5cedac5d0bd3e6f3 to your computer and use it in GitHub Desktop.
Save agharbeia/31f44862094f826d5cedac5d0bd3e6f3 to your computer and use it in GitHub Desktop.
A shel script to extracts images from the passed PDF file using pdfimages, and further process them.
#!/bin/bash
## By Ahmad Gharbeia, January 2016. Licensed under GPL version 3.0
## Extracts images from the passed PDF file using pdfimages.
## Additionally, converts PNM files (PBM and PPM) to PNG and optimises the resulting file using pnmtopng and pngcrush respectively.
## JPEG images resulting from the extraction of DCT images from the source PDF, are left unchanged.
dir=${1:-'.'}
shopt -s nullglob
tempdir=$(mktemp -d --tmpdir "$(basename $0).XXXXXXXXXX")
for pdf in "$dir"/*.pdf
do
if [ $(pdfimages -list "$pdf" | grep -c image) -eq 1 ]; then
echo "$pdf: "
pdfimages -j "$pdf" "${pdf%.pdf}"
fi
for pnm in "${pdf%.pdf}"-???.p{b,p}m
do
echo "$pnm" "-> ${pnm:0: -4}".png
temppng=$(mktemp --tmpdir=$tempdir XXXXXXXXXX)
##code is needed for conditionally rotating extracted images.
# pnmrotate -noantialias 90 "$pnm" | pnmtopng > $temppng
pnmtopng "$pnm" > $temppng
rm "$pnm"
pngcrush -fix -l 1 -q -z 9 $temppng "${pnm:0: -4}".png
rm $temppng
done
done
rm -r $tempdir
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment