Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save tomasfejfar/5c92333b60143f189a69e8a9cde55811 to your computer and use it in GitHub Desktop.
Save tomasfejfar/5c92333b60143f189a69e8a9cde55811 to your computer and use it in GitHub Desktop.
Convert PDF to text file using tesseract and imagemagick in cygwin
Required cygwin packages:
* tesseract-ocr
* ghostscript
* imagemagick
usr/bin/convert.exe -density 400 input.pdf -depth 8 output.tiff
tesseract -l eng -psm 1 output.tiff output_textfile
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment