Skip to content

Instantly share code, notes, and snippets.

@drjwbaker drjwbaker/ocr.py
Last active Dec 5, 2016

Embed
What would you like to do?
Tesseract OCR Engine
import pytesseract
import requests
from PIL import Image
from PIL import ImageFilter
from StringIO import StringIO
def process_image(url):
image = _get_image(url)
# image = image.resize( [int(2 * s) for s in image.size] )
# image.filter(ImageFilter.SHARPEN)
# image.filter(ImageFilter.EDGE_ENHANCE)
# image.filter(ImageFilter.FIND_EDGES)
return pytesseract.image_to_string(image)
def _get_image(url):
return Image.open(StringIO(requests.get(url).content))
# edit 5/12/16: image filters and resizing needs to be tested to see what works best.
# adapted from https://realpython.com/blog/python/setting-up-a-simple-ocr-server/
@snim2

This comment has been minimized.

Copy link

snim2 commented Dec 5, 2016

I think you need:

def process_image(url):
    image = _get_image(url)
    image = image.filter(ImageFilter.SHARPEN)
    image = image.filter(ImageFilter.EDGE_ENHANCE)
    image = image.filter(ImageFilter.FIND_EDGES)
    return pytesseract.image_to_string(image)
@drjwbaker

This comment has been minimized.

Copy link
Owner Author

drjwbaker commented Dec 5, 2016

Yay that works! Thanks Sarah!

@snim2

This comment has been minimized.

Copy link

snim2 commented Dec 5, 2016

np

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.