Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save rupeshtiwari/0f9bb7f3033b3a1f035933e5af334c88 to your computer and use it in GitHub Desktop.
Save rupeshtiwari/0f9bb7f3033b3a1f035933e5af334c88 to your computer and use it in GitHub Desktop.
lambda parse image using amazon textract and index in amazon Q

Parse Image file using Amazon Textract and AWS Lambda

let's start from the beginning and go step-by-step to create and test your Lambda function. This will involve setting up necessary IAM roles and policies, creating an S3 bucket, uploading an image, writing the Lambda function, and testing it.

Step 1: Set Up IAM Role for Lambda

  1. Create an IAM Role for Lambda:

    • Open the IAM console.
    • Create a new role with the following settings:
      • Trusted entity: AWS service.
      • Use case: Lambda.
      • Name: TextractImageParser
  2. Attach Policies to the Role:

    • Attach the following policies to the role:
      • AmazonS3ReadOnlyAccess: This allows Lambda to read from S3.
      • AmazonTextractFullAccess: This allows Lambda to use Textract.

image

Step 2: Create an S3 Bucket

  1. Create a New S3 Bucket:

    • Open the S3 console.
    • Create a new bucket (e.g., my-textract-input-bucket).
  2. Upload an Image to the Bucket:

    • Upload an image to the bucket. For example, upload a file named example-image.jpg.

image

Step 3: Write the Lambda Function

  1. Create the Lambda Function:
    • Open the Lambda console.
    • Create a new Lambda function with the following settings:
      • Name: TextractImageParser.
      • Runtime: Python 3.9 (or the latest version).
      • Execution role: Use the IAM role created earlier.

image

  1. Write the Code:
    • Replace the default code with the following:
import boto3
import json

s3 = boto3.client('s3')
textract = boto3.client('textract')

def lambda_handler(event, context):
    # Get the S3 bucket and object key from the event
    s3_bucket = event.get('s3Bucket')
    s3_object_key = event.get('s3ObjectKey')

    # Check if the necessary keys are present in the event
    if not s3_bucket or not s3_object_key:
        return {
            'error': 'Missing s3Bucket or s3ObjectKey in the event'
        }

    # Retrieve the image from S3
    try:
        response = s3.get_object(Bucket=s3_bucket, Key=s3_object_key)
        image_content = response['Body'].read()
    except Exception as e:
        return {
            'error': f"Failed to retrieve S3 object: {str(e)}"
        }

    # Use Amazon Textract to extract text from the image
    try:
        response = textract.detect_document_text(Document={'Bytes': image_content})
        document_text = '\n'.join([item['Text'] for item in response['Blocks'] if item['BlockType'] == 'LINE'])
    except Exception as e:
        return {
            'error': f"Failed to extract text with Textract: {str(e)}"
        }

    # Prepare the response with the extracted text
    return {
        'version': 'v0',
        's3ObjectKey': s3_object_key,
        'metadataUpdates': [
            {
                'name': 'image_text',
                'value': {
                    'stringValue': document_text
                }
            }
        ]
    }
  • Change the Timeout to 5 min.
  • Deploy the function.

Step 4: Test the Lambda Function

  1. Create a Test Event:
    • In the Lambda console, create a new test event with the following JSON:
{
    "s3Bucket": "my-textract-input-bucket",
    "s3ObjectKey": "example-image.jpg"
}
  1. Run the Test:
    • Save, deploy and run the test.
    • Check the execution results to ensure the text extraction is successful.

image

Step 5: Set Up S3 Bucket Policy (Optional)

You need to grant access to the S3 bucket specifically for your Lambda function, you can set a bucket policy.

  1. Create a Bucket Policy:
    • Go to the S3 bucket's permissions tab.
    • Add the following bucket policy (replace YOUR_LAMBDA_ROLE_ARN and my-textract-input-bucket with your values):
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "YOUR_LAMBDA_ROLE_ARN"
            },
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::my-textract-input-bucket/*"
        }
    ]
}
// EXAMPLE 
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::471112581752:role/TextractImageParser"
            },
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::my-textract-input-bucket/*"
        }
    ]
}

By following these steps, you will set up a Lambda function to read an image from S3, use Textract to extract text, and return the extracted text. You can test the function to ensure it works correctly before moving on to integrating with Amazon Q.

To connect your Lambda function to Amazon Q for Custom Data Enrichment (CDE), follow these steps:

Step 1: Create an API Gateway

To allow Amazon Q to invoke your Lambda function, you need to expose it through an API Gateway.

  1. Create an API Gateway:
    • Open the API Gateway console.
    • Create a new REST API.
    • Name it (e.g., TextractCDE).

image

  1. Create a Resource and Method:
    • Create a new resource (e.g., /enrich).
    • Under this resource, create a new POST method.

image

  1. Integrate with Lambda:
    • In the POST method, set the Integration type to Lambda Function.
    • Select the Lambda function (TextractImageParser) you created earlier.
    • Save the configuration.

image

  1. Deploy the API:
    • Deploy the API to a stage (e.g., test).
    • Note the API endpoint URL (e.g., https://your-api-id.execute-api.region.amazonaws.com/test/enrich).

image

Step 2: Test the API Gateway

  1. Test the API Endpoint:
    • Use a tool like Postman or curl to send a POST request to the API endpoint with the following body:
{
    "s3Bucket": "my-textract-input-bucket",
    "s3ObjectKey": "example-image.jpg"
}

Or Go to API Gateway>APIs>Resources - TextractCDE and select Test method

  1. Verify the Response:
    • Ensure you receive a valid response with the extracted text data.

image

Creating Amazon Q and calling Lambda by CDE

  • Createa Role for Amazon Q App Go to IAM roles create role

Select trusted party as AWS account image

  • Create a Role For Data Source of Amazon Q Select trusted party as AWS account Create AmazonQS3AccessRole with AmazonS3ReadOnlyAccess permisssion. image Create Role then Update Trusted Entities in the same role created above
{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Effect": "Allow",
			"Principal": {
				"Service": "qbusiness.amazonaws.com"
			},
			"Action": "sts:AssumeRole"
		}
	]
}

Add inline permission edit as json and put below

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket",
                "textract:DetectDocumentText",
                "lambda:InvokeFunction"
            ],
            "Resource": [
                "arn:aws:s3:::roxy-textract-input-bucket",
                "arn:aws:s3:::roxy-textract-input-bucket/*",
                "arn:aws:textract:*:*:document/*",
                "arn:aws:lambda:us-east-1:471112581752:function:TextractImageParser"
            ]
        }
    ]
}

Give policy name S3LambdaTextractAccess This is how your AmazonQAccessRole looks like It has 3 permission Policies AmazonS3ReadOnlyAccess AWS managed

AWSLambdaBasicExecutionRole AWS managed

S3LambdaTextractAccess Customer inline 0

image

Next Create Amazon Q app assign AmazonQAccessRole u created before. Create Data source use Amazon S3 as data source assign AmazonQAccessRole to this data source select s3 bucket which has some images to parse.

Next Go to Document enrichment and create new Add document enrichment

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment