rupeshtiwari/01_lambda parse image using amazon textract.md

## 01_lambda parse image using amazon textract.md

      
    Raw
  

              01_lambda parse image using amazon textract.md
            
          
    Parse Image file using Amazon Textract and AWS Lambda

let's start from the beginning and go step-by-step to create and test your Lambda function. This will involve setting up necessary IAM roles and policies, creating an S3 bucket, uploading an image, writing the Lambda function, and testing it.
Step 1: Set Up IAM Role for Lambda


Create an IAM Role for Lambda:

Open the IAM console.
Create a new role with the following settings:

Trusted entity: AWS service.
Use case: Lambda.
Name: TextractImageParser


Attach Policies to the Role:

Attach the following policies to the role:

AmazonS3ReadOnlyAccess: This allows Lambda to read from S3.
AmazonTextractFullAccess: This allows Lambda to use Textract.


Step 2: Create an S3 Bucket


Create a New S3 Bucket:

Open the S3 console.
Create a new bucket (e.g., my-textract-input-bucket).


Upload an Image to the Bucket:

Upload an image to the bucket. For example, upload a file named example-image.jpg.


Step 3: Write the Lambda Function


Create the Lambda Function:

Open the Lambda console.
Create a new Lambda function with the following settings:

Name: TextractImageParser.
Runtime: Python 3.9 (or the latest version).
Execution role: Use the IAM role created earlier.


Write the Code:

Replace the default code with the following:


import boto3
import json

s3 = boto3.client('s3')
textract = boto3.client('textract')

def lambda_handler(event, context):
    # Get the S3 bucket and object key from the event
    s3_bucket = event.get('s3Bucket')
    s3_object_key = event.get('s3ObjectKey')

    # Check if the necessary keys are present in the event
    if not s3_bucket or not s3_object_key:
        return {
            'error': 'Missing s3Bucket or s3ObjectKey in the event'
        }

    # Retrieve the image from S3
    try:
        response = s3.get_object(Bucket=s3_bucket, Key=s3_object_key)
        image_content = response['Body'].read()
    except Exception as e:
        return {
            'error': f"Failed to retrieve S3 object: {str(e)}"
        }

    # Use Amazon Textract to extract text from the image
    try:
        response = textract.detect_document_text(Document={'Bytes': image_content})
        document_text = '\n'.join([item['Text'] for item in response['Blocks'] if item['BlockType'] == 'LINE'])
    except Exception as e:
        return {
            'error': f"Failed to extract text with Textract: {str(e)}"
        }

    # Prepare the response with the extracted text
    return {
        'version': 'v0',
        's3ObjectKey': s3_object_key,
        'metadataUpdates': [
            {
                'name': 'image_text',
                'value': {
                    'stringValue': document_text
                }
            }
        ]
    }

Change the Timeout to 5 min.
Deploy the function.

Step 4: Test the Lambda Function


Create a Test Event:

In the Lambda console, create a new test event with the following JSON:


{
    "s3Bucket": "my-textract-input-bucket",
    "s3ObjectKey": "example-image.jpg"
}

Run the Test:

Save, deploy and run the test.
Check the execution results to ensure the text extraction is successful.


Step 5: Set Up S3 Bucket Policy (Optional)

You need to grant access to the S3 bucket specifically for your Lambda function, you can set a bucket policy.

Create a Bucket Policy:

Go to the S3 bucket's permissions tab.
Add the following bucket policy (replace YOUR_LAMBDA_ROLE_ARN and my-textract-input-bucket with your values):


{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "YOUR_LAMBDA_ROLE_ARN"
            },
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::my-textract-input-bucket/*"
        }
    ]
}
// EXAMPLE 
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::471112581752:role/TextractImageParser"
            },
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::my-textract-input-bucket/*"
        }
    ]
}

By following these steps, you will set up a Lambda function to read an image from S3, use Textract to extract text, and return the extracted text. You can test the function to ensure it works correctly before moving on to integrating with Amazon Q.

  
## 02 connecting lambda with api gateway.md

      
    Raw
  

              02 connecting lambda with api gateway.md
            
          
    To connect your Lambda function to Amazon Q for Custom Data Enrichment (CDE), follow these steps:
Step 1: Create an API Gateway

To allow Amazon Q to invoke your Lambda function, you need to expose it through an API Gateway.

Create an API Gateway:

Open the API Gateway console.
Create a new REST API.
Name it (e.g., TextractCDE).


Create a Resource and Method:

Create a new resource (e.g., /enrich).
Under this resource, create a new POST method.


Integrate with Lambda:

In the POST method, set the Integration type to Lambda Function.
Select the Lambda function (TextractImageParser) you created earlier.
Save the configuration.


Deploy the API:

Deploy the API to a stage (e.g., test).
Note the API endpoint URL (e.g., https://your-api-id.execute-api.region.amazonaws.com/test/enrich).


Step 2: Test the API Gateway


Test the API Endpoint:

Use a tool like Postman or curl to send a POST request to the API endpoint with the following body:


{
    "s3Bucket": "my-textract-input-bucket",
    "s3ObjectKey": "example-image.jpg"
}
Or Go to API Gateway>APIs>Resources - TextractCDE and select Test method

Verify the Response:

Ensure you receive a valid response with the extracted text data.


## 03 Amazon Q integration.md

      
    Raw
  

              03 Amazon Q integration.md
            
          
    Creating Amazon Q and calling Lambda by CDE


Createa Role for Amazon Q App
Go to IAM roles create role

Select trusted party as AWS account


Create a Role For Data Source of Amazon Q
Select trusted party as AWS account
Create AmazonQS3AccessRole with AmazonS3ReadOnlyAccess permisssion.

Create Role then
Update Trusted Entities in the same role created above

{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Effect": "Allow",
			"Principal": {
				"Service": "qbusiness.amazonaws.com"
			},
			"Action": "sts:AssumeRole"
		}
	]
}

Add inline permission edit as json and put below
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket",
                "textract:DetectDocumentText",
                "lambda:InvokeFunction"
            ],
            "Resource": [
                "arn:aws:s3:::roxy-textract-input-bucket",
                "arn:aws:s3:::roxy-textract-input-bucket/*",
                "arn:aws:textract:*:*:document/*",
                "arn:aws:lambda:us-east-1:471112581752:function:TextractImageParser"
            ]
        }
    ]
}


Give policy name S3LambdaTextractAccess
This is how your AmazonQAccessRole looks like It has 3 permission Policies
AmazonS3ReadOnlyAccess	AWS managed
AWSLambdaBasicExecutionRole	AWS managed
S3LambdaTextractAccess	Customer inline	0

Next
Create Amazon Q app assign AmazonQAccessRole u created before.
Create Data source use Amazon S3 as data source assign AmazonQAccessRole to this data source select s3 bucket which has some images to parse.
Next
Go to Document enrichment and create new Add document enrichment