let's start from the beginning and go step-by-step to create and test your Lambda function. This will involve setting up necessary IAM roles and policies, creating an S3 bucket, uploading an image, writing the Lambda function, and testing it.
-
Create an IAM Role for Lambda:
- Open the IAM console.
- Create a new role with the following settings:
- Trusted entity: AWS service.
- Use case: Lambda.
- Name: TextractImageParser
-
Attach Policies to the Role:
- Attach the following policies to the role:
- AmazonS3ReadOnlyAccess: This allows Lambda to read from S3.
- AmazonTextractFullAccess: This allows Lambda to use Textract.
- Attach the following policies to the role:
-
Create a New S3 Bucket:
- Open the S3 console.
- Create a new bucket (e.g.,
my-textract-input-bucket
).
-
Upload an Image to the Bucket:
- Upload an image to the bucket. For example, upload a file named
example-image.jpg
.
- Upload an image to the bucket. For example, upload a file named
- Create the Lambda Function:
- Open the Lambda console.
- Create a new Lambda function with the following settings:
- Name:
TextractImageParser
. - Runtime: Python 3.9 (or the latest version).
- Execution role: Use the IAM role created earlier.
- Name:
- Write the Code:
- Replace the default code with the following:
import boto3
import json
s3 = boto3.client('s3')
textract = boto3.client('textract')
def lambda_handler(event, context):
# Get the S3 bucket and object key from the event
s3_bucket = event.get('s3Bucket')
s3_object_key = event.get('s3ObjectKey')
# Check if the necessary keys are present in the event
if not s3_bucket or not s3_object_key:
return {
'error': 'Missing s3Bucket or s3ObjectKey in the event'
}
# Retrieve the image from S3
try:
response = s3.get_object(Bucket=s3_bucket, Key=s3_object_key)
image_content = response['Body'].read()
except Exception as e:
return {
'error': f"Failed to retrieve S3 object: {str(e)}"
}
# Use Amazon Textract to extract text from the image
try:
response = textract.detect_document_text(Document={'Bytes': image_content})
document_text = '\n'.join([item['Text'] for item in response['Blocks'] if item['BlockType'] == 'LINE'])
except Exception as e:
return {
'error': f"Failed to extract text with Textract: {str(e)}"
}
# Prepare the response with the extracted text
return {
'version': 'v0',
's3ObjectKey': s3_object_key,
'metadataUpdates': [
{
'name': 'image_text',
'value': {
'stringValue': document_text
}
}
]
}
- Change the Timeout to
5 min
. - Deploy the function.
- Create a Test Event:
- In the Lambda console, create a new test event with the following JSON:
{
"s3Bucket": "my-textract-input-bucket",
"s3ObjectKey": "example-image.jpg"
}
- Run the Test:
- Save, deploy and run the test.
- Check the execution results to ensure the text extraction is successful.
You need to grant access to the S3 bucket specifically for your Lambda function, you can set a bucket policy.
- Create a Bucket Policy:
- Go to the S3 bucket's permissions tab.
- Add the following bucket policy (replace
YOUR_LAMBDA_ROLE_ARN
andmy-textract-input-bucket
with your values):
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "YOUR_LAMBDA_ROLE_ARN"
},
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::my-textract-input-bucket/*"
}
]
}
// EXAMPLE
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::471112581752:role/TextractImageParser"
},
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::my-textract-input-bucket/*"
}
]
}
By following these steps, you will set up a Lambda function to read an image from S3, use Textract to extract text, and return the extracted text. You can test the function to ensure it works correctly before moving on to integrating with Amazon Q.