Skip to content

Instantly share code, notes, and snippets.

@anvk
Forked from henrik/ocr.markdown
Created February 17, 2017 20:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save anvk/d804d4d9ec37a8fa53d0cbdb72440480 to your computer and use it in GitHub Desktop.
Save anvk/d804d4d9ec37a8fa53d0cbdb72440480 to your computer and use it in GitHub Desktop.
OCR on OS X with tesseract

Install ImageMagick for image conversion:

brew install imagemagick

Install tesseract for OCR:

brew install tesseract --all-languages

Or install without --all-languages and install them manually as needed.

Make sure the input image is a grayscale .tif and fairly large. ~500x150 was too small, while ~2000*500 worked very well.

convert input.png -resize 400% -type Grayscale input.tif

OCR it. The default language is English. Language codes are 3 chars per man tesseract.

tesseract -l eng input.tif output

This creates output.txt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment