Skip to content

Instantly share code, notes, and snippets.

@endolith
Last active January 30, 2022 01:40
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save endolith/d64673498b2063a6b1573560d6bf91dd to your computer and use it in GitHub Desktop.
Save endolith/d64673498b2063a6b1573560d6bf91dd to your computer and use it in GitHub Desktop.
How to extract images from PDF files recursively in folders in fish shell

Dependencies

sudo apt install poppler-utils imagemagick

To extract all the images:

for file in **.pdf
    pdfimages -all "$file" "$file"
end

Then since they're full-page images, to remove the ID number at the bottom of the page:

for file in *.png
    convert "$file" -crop +0-100 +repage "cropped $file"
end

and then auto-crop the white space:

for file in cropped*.png
    convert "$file" -trim +repage "$file"
end

though it still needs to get a little closer:

for file in cropped*.png
    convert "$file" -fuzz 10% -trim +repage "$file"
end
@endolith
Copy link
Author

endolith commented Jan 30, 2022

On Windows, just use the poppler-utils inside Windows Subsystem for Linux

 for %f in (*.*) do wsl pdfimages -all %f %f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment