Skip to content

Instantly share code, notes, and snippets.

@kspeeckaert
Last active March 14, 2016 14:29
Show Gist options
  • Save kspeeckaert/22eeee4afd0f8aaaaa65 to your computer and use it in GitHub Desktop.
Save kspeeckaert/22eeee4afd0f8aaaaa65 to your computer and use it in GitHub Desktop.
Calling project Oxford from Python to perform OCR on an image on the clipboard

Purpose

Copy an image to the clipboard, then execute an Alfred workflow to trigger OCR on the copied image. Copy the text returned by the webservice to the clipboard.

Notes

Binary clipboard data

Retrieving anything but text from the (OS X) keyboard in Python seems troublesome. The CLI command pbpaste and the Python libraries xerox and pyperclip only support text. Judging from posts on StackOverflow, it should be possible using PyObjC or Tkinter but I couldn't get the former to install and the latter seems a bit much just to get access to the clipboard.

Instead, I used the CLI utility pngpaste, which aims to do what pbpaste does, but for binary data. By using - as parameter instead of a filename, I can retrieve the binary data from stdout into Python.

Image dimensions

The API's requirements state that the image must be at least 40x40 pixels. If not, the server returns an HTTP 500 error. Therefore, we need to use PIL to check the image dimensions and change them to meet the requirements if the dimesions are smaller than 40px.

import requests
import subprocess
import sys
from PIL import Image
from io import BytesIO
api_url='https://api.projectoxford.ai/vision/v1/ocr'
header = {'Ocp-Apim-Subscription-Key': '',
'Content-Type': 'application/octet-stream'}
params = {'language': 'unk'}
try:
# Retrieve the binary image data from the clipboard
p = subprocess.run('./pngpaste -',
shell=True,
check=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
img_data = p.stdout
img = Image.open(BytesIO(img_data))
# Ensure the image is at least 40x40
if min(img.size) < 40:
img = img.crop((0, 0, max(img.size[0], 40), max(img.size[1], 40)))
bin_img = BytesIO()
img.save(bin_img, format='PNG')
img.close()
img_data = bin_img.getvalue()
bin_img.close()
r = requests.post(api_url,
params=params,
headers=header,
data=img_data)
r.raise_for_status()
data = r.json()
text = ''
for item in r.json()['regions']:
for line in item['lines']:
for word in line['words']:
text += ' ' + word['text']
text += '\n'
print(text)
except subprocess.CalledProcessError as e:
print('Could not get image from clipboard: {}'.format(e))
except requests.HTTPError as e:
print('HTTP error occurred: {}'.format(e))
except Exception as e:
print('Error occurred: {}'.format(e))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment