Skip to content

Instantly share code, notes, and snippets.

@derwiki
Created March 31, 2017 21:30
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save derwiki/bd57eb155076b973116fee8f24a178ca to your computer and use it in GitHub Desktop.
Save derwiki/bd57eb155076b973116fee8f24a178ca to your computer and use it in GitHub Desktop.
OCRing PDFs using Ghostscript and Google Cloud Vision
$ for x in 100 200 225 250 300 ; do echo $x; gs -sDEVICE=jpeg -DBATCH -dNOPAUSE -r$x -sOutputFile=warren.jpg -dLastPage=1 -dFirstPage=1 warren.pdf 1>/dev/null ; jpeginfo warren.jpg; done
100
warren.jpg 856 x 1400 24bit JFIF N 80332
200
warren.jpg 1712 x 2800 24bit JFIF N 240411
225
warren.jpg 1926 x 3150 24bit JFIF N 284315
250
warren.jpg 2140 x 3500 24bit JFIF N 337588
300
warren.jpg 2568 x 4200 24bit JFIF N 454876
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment