Skip to content

Instantly share code, notes, and snippets.

@simonw
Created April 2, 2023 17:14
Show Gist options
  • Save simonw/365ca7e4fde3ae8221ca1da219ce3fc9 to your computer and use it in GitHub Desktop.
Save simonw/365ca7e4fde3ae8221ca1da219ce3fc9 to your computer and use it in GitHub Desktop.

OCR with Vision API

2023-04-01 17:11:39

user: I want to use the google cloud vision API to run OCR from a python script

First help me understand what access token credentials I need for this project and how to create them


assistant: To use the Google Cloud Vision API for OCR in a Python script, you'll need to set up a project in the Google Cloud Platform (GCP) console and obtain the necessary access token credentials in the form of a JSON key file. Here's a step-by-step guide to help you create the credentials:

  1. Go to the GCP Console: https://console.cloud.google.com/

  2. Sign in with your Google account or create a new one if you don't have one.

  3. Create a new project or select an existing one: a. Click on the project drop-down menu at the top. b. Click on "New Project" or select an existing project from the list. c. If creating a new project, provide a name for your project and click "Create."

  4. Enable the Cloud Vision API: a. Click on the navigation menu (three horizontal lines) at the top-left corner. b. Go to "APIs & Services" > "Library." c. Search for "Cloud Vision API" and click on the result. d. Click "Enable" to enable the API for your project.

  5. Create a service account and generate a JSON key file: a. Go to the navigation menu and click on "IAM & Admin" > "Service Accounts." b. Click on "Create Service Account" at the top. c. Provide a name and description for the service account, then click "Create." d. Grant your service account the necessary permissions by selecting a role. For this case, choose "Cloud Vision" > "Cloud Vision API User." You can also choose "Editor" or "Owner" if you need broader permissions. Click "Continue." e. Click on "Done" to finish creating the service account. f. In the "Service Accounts" list, locate the service account you just created and click on the three vertical dots under "Actions," then click on "Manage keys." g. Click on "Add Key" and choose "JSON." h. The JSON key file will be generated and downloaded to your computer. Keep this file secure, as it contains sensitive information.

Now that you have the JSON key file, you can use it to authenticate your Python script to access the Google Cloud Vision API. To do so, you'll need to set the GOOGLE_APPLICATION_CREDENTIALS environment variable to the path of your JSON key file. In your Python script, you can do this using the os library:

import os

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'path/to/your/json_key_file.json'

With the credentials set up, you can now proceed to write your Python script to use the Google Cloud Vision API for OCR. Make sure to install the required Python client library:

pip install --upgrade google-cloud-vision

Then you can write a script to interact with the Vision API, for example:

from google.cloud import vision

def ocr_image(image_path):
    client = vision.ImageAnnotatorClient()

    with open(image_path, 'rb') as image_file:
        content = image_file.read()

    image = vision.Image(content=content)
    response = client.text_detection(image=image)
    texts = response.text_annotations

    return texts[0].description if texts else ''

This ocr_image function takes an image path as input and returns the OCR result as a string.


user: Is there a way to write this script without needing to use environment variables?


assistant: Yes, you can authenticate your Python script without using environment variables by explicitly providing the path to your JSON key file when creating the ImageAnnotatorClient instance. To do this, you'll need to use the from_service_account_file method from the google.oauth2.service_account package.

First, make sure you have the required packages installed:

pip install --upgrade google-cloud-vision google-auth google-auth-httplib2 google-api-python-client

Then, you can modify the script to avoid using environment variables like this:

import os
from google.cloud import vision
from google.oauth2 import service_account

def ocr_image(image_path, credentials_path):
    credentials = service_account.Credentials.from_service_account_file(credentials_path)
    client = vision.ImageAnnotatorClient(credentials=credentials)

    with open(image_path, 'rb') as image_file:
        content = image_file.read()

    image = vision.Image(content=content)
    response = client.text_detection(image=image)
    texts = response.text_annotations

    return texts[0].description if texts else ''

# Usage
image_path = "path/to/your/image.jpg"
credentials_path = "path/to/your/json_key_file.json"
result = ocr_image(image_path, credentials_path)
print(result)

In this modified version of the ocr_image function, you'll need to pass the path to the JSON key file as an additional argument (credentials_path). The function will create a Credentials object from the JSON key file, and then it will pass that object to the ImageAnnotatorClient constructor. This way, you don't need to use environment variables to authenticate your script.


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment