Skip to content

Instantly share code, notes, and snippets.

@kination
Created June 16, 2022 05:10
Show Gist options
  • Save kination/c5ea61c4e9ea0c9cc78854fbb2947286 to your computer and use it in GitHub Desktop.
Save kination/c5ea61c4e9ea0c9cc78854fbb2947286 to your computer and use it in GitHub Desktop.
import boto3
from botocore.config import Config
INDEX_ID = 'add-kendra-index-id'
KENDRA_DATA_SOURCE_ID = 'add-kendra-data-source-id'
# ------- Dummy datasets --------
TITLES = list()
DOCS = list()
TITLES.append('What is Amazon Kendra?')
DOCS.append("""
Amazon Kendra is a highly accurate and easy-to-use enterprise search service that’s powered by machine learning (ML). It allows developers to add search capabilities to their applications so their end users can discover information stored within the vast amount of content spread across their company. This includes data from manuals, research reports, FAQs, human resources (HR) documentation, and customer service guides, which may be found across various systems such as Amazon Simple Storage Service (S3), Microsoft SharePoint, Salesforce, ServiceNow, RDS databases, or Microsoft OneDrive. When you type a question, the service uses ML algorithms to understand the context and return the most relevant results, whether that means a precise answer or an entire document. For example, you can ask a question such as "How much is the cash reward on the corporate credit card?” and Amazon Kendra will map to the relevant documents and return a specific answer (such as “2%”). Kendra provides sample code so you can get started quickly and easily integrate highly accurate search into your new or existing applications.
""")
TITLES.append('How do I get up and running with Amazon Kendra?')
DOCS.append("""
The Amazon Kendra console provides the easiest way to get started. You can point Amazon Kendra at unstructured and semi-structured documents such as FAQs stored in Amazon S3. After ingestion, you can start testing Kendra by typing queries directly in the “search” section of the console. You can then deploy Amazon Kendra search in two easy ways: (1) use the visual UI editor in our Experience Builder (no code required), or (2) implement the Amazon Kendra API using a few lines of code for more-precise control. Code samples are also provided in the console to speed up API implementation.
""")
TITLES.append('What code changes do I need to make to use Amazon Kendra?')
DOCS.append("""
Ingesting content does not require coding when using the native connectors. You can also write your own custom connectors to integrate with other data sources, using the Amazon Kendra SDK. You can deploy Amazon Kendra search in two easy ways: (1) use the visual UI editor in our Experience Builder (no code required), or (2) implement the Kendra API using a few lines of code for more flexibility. Code samples are also provided in the console to speed up API implementation. The SDK provides full control and flexibility of the end-user experience.
""")
# ------- Dummy datasets end --------
my_config = Config(
region_name = 'us-west-2', # for now ap-northeast is not supported
retries = {
'max_attempts': 10,
'mode': 'standard'
}
)
kendra_client = boto3.client("kendra", config=my_config)
print('create index')
result = kendra_client.start_data_source_sync_job(
Id = KENDRA_DATA_SOURCE_ID,
IndexId = INDEX_ID
)
print("Start data source sync operation: ")
print(result)
job_execution_id = result['ExecutionId']
print("Job execution ID: " + job_execution_id)
try:
documents = list()
for i in range(len(TITLES)):
doc = {
"Id": str(i),
"Blob": DOCS[i],
"Title": TITLES[i],
"Attributes": [
{
"Key": "_data_source_id",
"Value": {
"StringValue": KENDRA_DATA_SOURCE_ID
}
},
{
"Key": "_data_source_sync_job_execution_id",
"Value": {
"StringValue": job_execution_id
}
}
]
}
documents.append(doc)
result = kendra_client.batch_put_document(
IndexId = INDEX_ID,
Documents = documents
)
print("Response from batch_put_document:")
print(result)
except Exception as e:
print("Exception")
print(e)
finally:
result = kendra_client.stop_data_source_sync_job(
Id = KENDRA_DATA_SOURCE_ID,
IndexId = INDEX_ID
)
print("Stop data source sync operation:")
print(result)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment