Skip to content

Instantly share code, notes, and snippets.

@jmtaysom
Created December 15, 2016 03:16
Show Gist options
  • Save jmtaysom/d3a4ab6dbb2c18eeb8ab9908aa5b608d to your computer and use it in GitHub Desktop.
Save jmtaysom/d3a4ab6dbb2c18eeb8ab9908aa5b608d to your computer and use it in GitHub Desktop.
ocr example
import glob
import PIL
import pytesseract
for image in glob.glob(r'/Users/me/condo/word/media/*.jpg'):
txt = pytesseract.image_to_string(PIL.Image.open(image))
image_id = image.split('/')[-1].split('.')[0].replace('image','')
with open('/Users/me/condo/text/{}.txt'.format(image_id), 'w') as f:
f.writelines(txt)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment