Created
August 6, 2016 09:14
-
-
Save jbl0ndie/ce0659fea469a334b10d580f70df8c6a to your computer and use it in GitHub Desktop.
OCR a pdf in OS X
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#User downloads a pdf but it's an image only document and the text cannot be searched | |
#open a terminal | |
#brew install tesseract (unless you already have it installed) | |
#open the document in Preview and export to a Tiff document (multipage is supported, 150dpi seems ok | |
#change to the file directory in terminal to save you the bother of putting the full path in | |
#tesseract filename.tiff outputfilename pdf | |
#tesseract then crunches through your file and creates an output file with the specified name and filetype |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment