A customer in the communications sector grapples with effectively managing an ever-expanding repository of email data, which has reached the scale of multiple terabytes and is on track to hit petabyte levels. Their challenges include not only the logging and analysis of vast data but also maintaining performance and cost-efficiency during storage and retrieval processes.
The primary goal is to architect an AWS OpenSearch-powered system that capably ingests, retains, and scrutinizes terabytes, potentially scaling to petabytes, of email communications. The solution aims to address critical challenges such as:
- High-volume Data Ingestion: Efficiently managing daily incoming email logs amounting to approximately 2TB.
- Data Retention and Accessibility: Enabling active user search capabilities across a 6-month range within the most recent year's data while also retaining 2 years of data.
- Storage Optimization: Implementing Index State Management (ISM) policies to optimize storage across different data lifecycle stages, from hot to cold.
- Minimal Client Disruption: Utilizing data streams and aliases to minimize the impact of backend changes on client-side configurations.
-
Robust Data Ingestion:
- Establish a robust pipeline capable of handling 2TB/day of email log ingestion into OpenSearch, considering daily or hourly index creation to manage the volume effectively.
-
Scalable Data Stream Management:
- Implement data streams to oversee the continuous influx of email data, ensuring new indices are generated as needed for time-series data expansion.
-
Dynamic Alias Implementation:
- Use index aliases for a consistent query interface, allowing backend scalability without client-side reconfiguration.
-
Lifecycle-based ISM Policies:
- Design ISM policies to transition data through various storage states, maintaining hot storage for immediate 6-month range searches and managing the transition of older data to warm and cold storage tiers.
-
Proactive Monitoring and Alerts:
- Deploy a comprehensive monitoring system with alerts for events that could impact performance or data integrity.
- Petabyte-scale Management: The system supports expansion to petabyte-scale datasets, ensuring long-term scalability.
- Cost and Storage Efficiency: ISM policies strategically transition data to cost-effective storage solutions without sacrificing data availability for search queries.
- Operational Streamlining: Automated index lifecycle management significantly cuts down on manual oversight, enhancing operational efficiency.
- Uninterrupted Client Operations: The client-side experience remains consistent and reliable despite extensive backend data management activities.
The designed solution effectively surmounts the immense challenges posed by the storage and analysis of large-scale email data. With AWS OpenSearch at its core, complemented by ISM policies and data streams, the system promises scalability, cost-effectiveness, and operational efficiency while supporting extensive search capabilities over substantial historical data.
Let's understand what is data streams in Amazon OpenSearch Service.
Data streams in Amazon OpenSearch Service help simplify this initial setup process. Data streams work out of the box for time-based data such as application logs that are typically append-only in nature. A data stream is internally composed of multiple backing indices. Search requests are routed to all the backing indices, while indexing requests are routed to the latest write index.
The diagram illustrates the concept of a Data Stream in Amazon OpenSearch Service, specifically designed to store time-series data:
-
Data Stream Structure: A Data Stream is a collection of indices that store time-series data. It is treated as a first-class citizen within OpenSearch, meaning it's a primary feature with full support for time-series data management.
-
Backing Indices Naming Convention: Each index within the data stream is referred to as a backing index. These indices are named systematically following a pattern:
.ds-<data-stream-name>-<generation-id>
..ds-logs-nginx-000001
: This represents the first generation backing index in the data stream for storing logs data related to Nginx..ds-logs-nginx-000002
: This is the second generation backing index, succeeding the first one as more data is ingested over time..ds-logs-nginx-000003
: This is the third generation backing index, continuing the sequence for storing incoming data.
-
Generational Index Rotation: As time progresses and more data is ingested, new backing indices are automatically created. The generational ID (
000001
,000002
,000003
, etc.) increments to reflect the chronological order and versioning of the data within the stream.
This setup allows for efficient management of time-series data, with the ability to handle high ingest volumes and optimize for the lifecycle of data as it ages from hot to cold storage. Data streams also facilitate easier scaling, searching, and management of time-based data across multiple indices under a single, consolidated abstraction.
The image illustrates the handling of write requests within an OpenSearch Data Stream, emphasizing the targeted ingestion into the most recent backing index:
- Focused Write Requests: Write operations are directed exclusively to the latest backing index within the data stream.
- Current Write Index: The newest index,
.ds-logs-nginx-000003
, is actively receiving data, signifying it as the current write index. - Data Stream Evolution: The indices
.ds-logs-nginx-000001
and.ds-logs-nginx-000002
represent earlier phases in the data lifecycle, having previously served as the active write destinations before rolling over to the next sequence.
Operational Detail:
- As new data arrives, only the current write index, indicated by the highest generation suffix (
000003
in this case), is updated. This ensures that data is stored in an organized, chronological manner.
The diagram showcases how search requests are handled in an OpenSearch Data Stream:
- Search Requests: Incoming queries from users or applications, represented by the cloud symbol, are sent to the Data Stream.
- Data Stream Access: The Data Stream is a collection of chronologically managed indices, allowing search queries to span across all indices.
- Backing Indices: Each of the indices (
.ds-logs-nginx-000001
,.ds-logs-nginx-000002
,.ds-logs-nginx-000003
) contains a portion of the time-series data, and all are searched collectively to provide comprehensive search results.
In contrast to write requests which are directed to the latest index, search requests in a Data Stream can access data across all backing indices, ensuring a full search scope over the entire data set.
For in-depth understanding and best practices regarding Data Streams in OpenSearch, refer to the following resources:
- AWS Documentation on Data Streams: Amazon OpenSearch Service Data Streams
- OpenSearch Community Documentation: OpenSearch Data Stream Documentation
These resources provide detailed guidance on implementing and managing data streams, offering insights into how write requests are managed and indices are rotated in a time-series data context.
Next let's understand ISM Policy in Amazon OpenSearch Service.
Index State Management (ISM) in Amazon OpenSearch Service lets you define custom management policies that automate routine tasks, and apply them to indexes and index patterns. You no longer need to set up and manage external processes to run your index operations.
First lets understand data lifecycle management strategy in Amazon OpenSearch Service using hot, ultrawarm and cold storage.
This diagram illustrates a data lifecycle management strategy in the AWS Cloud using Amazon OpenSearch Service:
-
Hot Storage:
- Data ingestion starts here, with 2 TB/day written to hot storage.
- Hot storage has a capacity of 4TB, suitable for frequent access, and is backed by Amazon EBS for persistent storage.
-
Transition to UltraWarm:
- After 2 days, data transitions to an UltraWarm node.
- The UltraWarm node can hold up to 120TB, allowing less frequent access but with 90% cost savings compared to hot storage.
-
Cold Storage:
- Data moves to cold storage after 60 days on the UltraWarm node.
- Cold storage, backed by Amazon S3, is used for the least frequently accessed data.
Throughout this lifecycle, an alias is used to manage read/write access to the indices, providing an abstraction layer for clients interacting with the data, regardless of its storage tier.
Next let's understand how ISM Policy works:
The diagram outlines a process flow for index state management in Amazon OpenSearch Service using an Index State Management (ISM) policy:
-
Attach ISM Policy: An actor (user or automated process) attaches an ISM policy to an OpenSearch Service Index.
-
ISM Job Initiation: The ISM system starts a job related to the ISM policy.
-
Condition Checks: The job, which runs periodically every 5 to 8 minutes, checks certain conditions within the OpenSearch Index.
-
Perform Actions and Transitions: Based on the conditions checked, the job performs actions and transitions the index to different states, such as hot, warm, or cold, as defined by the ISM policy.
This ISM policy ensures that the index is managed automatically according to pre-defined conditions, making the process efficient and reliable.
For more detailed information on Index State Management in Amazon OpenSearch Service, you can visit the official AWS documentation: Index State Management Documentation.
Next let's see what ISM policy will do for this use case
The diagram depicts the lifecycle management of an index in Amazon OpenSearch Service using Index State Management (ISM) to transition from hot to warm to cold states before deletion:
-
Hot State: The index begins in the default hot state, where it is actively written to and read from, serving as the primary state for immediate query and ingest operations.
-
Transition to Warm: After 30 days, the index transitions to the warm state. This state is optimized for less frequent access, still allowing queries but with lower cost compared to the hot state.
-
Transition to Cold: Following 90 days in the warm state, the index moves to the cold state. Cold storage is typically backed by cheaper storage solutions like Amazon S3 and is used for the least frequently accessed data.
-
Deletion: Finally, after 365 days in the cold state, the index is marked for deletion. This is the end of the index lifecycle, and the data is removed to reclaim storage space.
For more information on the Index State Management in Amazon OpenSearch Service, you can visit the official AWS documentation here: Amazon OpenSearch Service Index State Management.
In this guide, we'll demonstrate how to set up and manage a scalable data architecture using AWS OpenSearch. This practical walkthrough covers the end-to-end process, structured in clear, actionable steps to facilitate a hands-on experience:
- Step 1: Initialize index template and data stream.
- Step 2: Formulate an Index State Management (ISM) policy.
- Step 3: Generate an index and bulk ingest data.
- Step 4: Conduct searches across backing indices.
- Step 5: Clean up to prevent extra costs.
This concise guide ensures a cost-effective demonstration of OpenSearch's data management process. Note: Index, Alias and Data stream names must be unique.
Step 1: Create index teamplate To create a data stream, you first need to create an index template that configures a set of indexes as a data stream.
This code is an Elasticsearch PUT request to create an index template named email_conversations_data_template
. In this case, each ingested document must have an @timestamp
field. I am defining my own custom timestamp field @date_time
as a property in the data_stream object which will create a data stream:
PUT _index_template/email_conversations_data_template
{
"index_patterns": ["email_conversations*"],
"data_stream": {
"timestamp_field": {
"name": "date_time"
}
},
"template": {
"settings": {
"number_of_shards": 1
},
"mappings": {
"properties": {
"message_id": {
"type": "keyword"
},
"date_time": {
"type": "date"
},
"from": {
"type": "keyword"
},
"to": {
"type": "keyword"
},
"cc": {
"type": "keyword"
},
"bcc": {
"type": "keyword"
},
"subject": {
"type": "text"
},
"body": {
"type": "text"
},
"attachments": {
"type": "nested",
"properties": {
"file_name": {
"type": "keyword"
},
"content_type": {
"type": "keyword"
}
}
},
"labels": {
"type": "keyword"
},
"thread_id": {
"type": "keyword"
},
"status": {
"type": "keyword"
}
}
},
"aliases": {
"company_email_records": {}
}
},
"priority": 200
}
Output should be similar to :
200-OK
{
"acknowledged": true
}
Run script to validate template created:
GET _cat/templates
Step 2: Create ISM Policy
However for our demo, here is the definition of ISM policy that we will create:
- Rollover when document count is 10 or more and keep the rolled-over indices in the hot tier.
- Move to the warm tier when the total document size is 1MB or more.
- After 60 days in the warm tier, transition the data to the cold tier.
Name of the ISM policy is rollover_then_transition
-
Run below to see all ISM policies:
GET _plugins/_ism/policies
You will see all existing policies
-
Run below script to create new ISM policy:
PUT _plugins/_ism/policies/rollover_then_transition
{
"policy": {
"description": "Roll over at 10 docs, move to warm at 100kb, then to cold after 20 minutes",
"default_state": "hot",
"states": [
{
"name": "hot",
"actions": [
{
"rollover": {
"min_doc_count": 10
}
}
],
"transitions": [
{
"state_name": "warm",
"conditions": {
"min_size": "100kb"
}
}
]
},
{
"name": "warm",
"actions": [],
"transitions": [
{
"state_name": "cold",
"conditions": {
"min_index_age": "20m"
}
}
]
},
{
"name": "cold",
"actions": [],
"transitions": []
}
],
"ism_template": {
"index_patterns": ["email_conversations*"],
"priority": 200
}
}
}
You should see output similar to:
200-OK
{
"_id": "rollover_then_transition",
"_version": 1,
"_primary_term": 2,
"_seq_no": 5068,
"policy": {
"policy": {
"policy_id": "rollover_then_transition",
"description": "Roll over at 10 docs, move to warm at 100kb, then to cold after 20 minutes",
"last_updated_time": 1700024663058,
"schema_version": 1,
"error_notification": null,
"default_state": "hot",
"states": [
{
"name": "hot",
"actions": [
{
"retry": {
"count": 3,
"backoff": "exponential",
"delay": "1m"
},
"rollover": {
"min_doc_count": 10,
"copy_alias": false
}
}
],
"transitions": [
{
"state_name": "warm",
"conditions": {
"min_size": "100kb"
}
}
]
},
{
"name": "warm",
"actions": [],
"transitions": [
{
"state_name": "cold",
"conditions": {
"min_index_age": "20m"
}
}
]
},
{
"name": "cold",
"actions": [],
"transitions": []
}
],
"ism_template": [
{
"index_patterns": [
"email_conversations*"
],
"priority": 200,
"last_updated_time": 1700024663058
}
]
}
}
}
-
Run below script to confirm policy created
GET _plugins/_ism/policies/rollover_then_transition
Confirm that you see the
rollover_then_transition
output.
Step 3: Create Index and ingest documents in bulk
After we index documents then only you will see data-stream
, alias
and backed index will create.
- Run below script to ingest documents
POST _bulk
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "1", "date_time": "2023-11-14T08:00:00Z", "from": "alice@example.com", "to": ["bob@example.com"], "subject": "Project Update", "body": "The project is on track for the deadline.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "2", "date_time": "2023-11-14T08:10:00Z", "from": "carol@example.com", "to": ["dave@example.com"], "subject": "Budget Review", "body": "Please review the attached budget report.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "3", "date_time": "2023-11-14T08:20:00Z", "from": "eve@example.com", "to": ["frank@example.com"], "subject": "New Design Mockups", "body": "Attached are the new mockups for the design.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "4", "date_time": "2023-11-14T08:30:00Z", "from": "grace@example.com", "to": ["henry@example.com"], "subject": "Weekly Sync", "body": "Let's schedule our weekly sync to discuss the project's progress.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "5", "date_time": "2023-11-14T08:40:00Z", "from": "ida@example.com", "to": ["jack@example.com"], "subject": "Client Feedback", "body": "Here's the client feedback on the recent submission.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "6", "date_time": "2023-11-14T08:50:00Z", "from": "jill@example.com", "to": ["kevin@example.com"], "subject": "Conference Call", "body": "Reminder about the conference call later today.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "7", "date_time": "2023-11-14T09:00:00Z", "from": "leo@example.com", "to": ["mike@example.com"], "subject": "Version Control", "body": "Please update the files in the version control system.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "8", "date_time": "2023-11-14T09:10:00Z", "from": "nina@example.com", "to": ["oscar@example.com"], "subject": "Lunch Meeting", "body": "Can we meet for lunch to discuss the project updates?", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "9", "date_time": "2023-11-14T09:20:00Z", "from": "paul@example.com", "to": ["quincy@example.com"], "subject": "Documentation", "body": "The documentation needs to be updated with the latest changes.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "10", "date_time": "2023-11-14T09:30:00Z", "from": "rachel@example.com", "to": ["steve@example.com"], "subject": "New Hire", "body": "Please welcome our new team member and assist with onboarding.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "11", "date_time": "2023-11-14T09:40:00Z", "from": "susan@example.com", "to": ["tom@example.com"], "subject": "Invoice Approval", "body": "The attached invoice needs your approval.", "status": "unread" }
You will see the output similar too:
200-OK
{
"took": 132,
"errors": false,
"items": [
{
"create": {
"_index": ".ds-email_conversations-000001",
"_id": "6ydk0YsB2mqt0OvCabBA",
"_version": 1,
"result": "created",
"_shards": {
"total": 3,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1,
"status": 201
}
},
{
"create": {
"_index": ".ds-email_conversations-000001",
"_id": "7Cdk0YsB2mqt0OvCabBA",
"_version": 1,
"result": "created",
"_shards": {
"total": 3,
"successful": 1,
"failed": 0
},
"_seq_no": 1,
"_primary_term": 1,
"status": 201
}
},
... other records
Navigate to Index Management > State management policies and confirm that ISM policy is created:
Navigate to Index Management > Data streams confirm email_conversations
is created
Navigate to Index Management > Data streams > email_conversations validate backing index ds-email_conversations-000001
is created.
Navigate to Index Management > Aliases confirm email_conversations_search
alias created
Since, we ingested 11 documents, it is supposed to rollover the first back index, wait for 15 mins
to see the new backing index created for the data stream.
Notice there is rollover happened and you see .ds-email_conversations-000002
new backing index is created as writing index.
- Ingest more documents
POST _bulk
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "12", "date_time": "2023-11-14T09:50:00Z", "from": "uma@example.com", "to": ["victor@example.com"], "subject": "System Outage", "body": "Please investigate the cause of the unexpected system outage.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "13", "date_time": "2023-11-14T10:00:00Z", "from": "vanessa@example.com", "to": ["wesley@example.com"], "subject": "IT Support Needed", "body": "Could you help with the laptop issue I'm facing?", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "14", "date_time": "2023-11-14T10:10:00Z", "from": "xander@example.com", "to": ["yolanda@example.com"], "subject": "Updated Proposal", "body": "The updated proposal is attached for your review.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "15", "date_time": "2023-11-14T10:20:00Z", "from": "zach@example.com", "to": ["alice@example.com"], "subject": "Re: Project Update", "body": "Thanks for the update, I have some additional thoughts.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "16", "date_time": "2023-11-14T10:30:00Z", "from": "amelia@example.com", "to": ["bob@example.com"], "subject": "Next Steps in the Project", "body": "We need to decide on the next steps for the project milestones.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "17", "date_time": "2023-11-14T10:40:00Z", "from": "brad@example.com", "to": ["carol@example.com"], "subject": "Budget Allocation", "body": "We need to discuss the budget allocation for next quarter.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "18", "date_time": "2023-11-14T10:50:00Z", "from": "cindy@example.com", "to": ["dave@example.com"], "subject": "Meeting Rescheduled", "body": "The meeting has been rescheduled to next week.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "19", "date_time": "2023-11-14T11:00:00Z", "from": "derek@example.com", "to": ["eve@example.com"], "subject": "Sales Targets", "body": "Please provide the latest sales targets for the upcoming presentation.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "20", "date_time": "2023-11-14T11:10:00Z", "from": "elaine@example.com", "to": ["frank@example.com"], "subject": "Customer Inquiry Follow-Up", "body": "We need to follow up on the customer inquiry from last week.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "21", "date_time": "2023-11-14T11:20:00Z", "from": "franklin@example.com", "to": ["grace@example.com"], "subject": "Updated Contact List", "body": "Please find the updated contact list attached.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "22", "date_time": "2023-11-14T11:30:00Z", "from": "gina@example.com", "to": ["henry@example.com"], "subject": "Marketing Materials", "body": "Could you send over the latest marketing materials?", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "23", "date_time": "2023-11-14T11:40:00Z", "from": "hank@example.com", "to": ["ida@example.com"], "subject": "Presentation Feedback", "body": "Please provide your feedback on the latest presentation.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "24", "date_time": "2023-11-14T11:50:00Z", "from": "irene@example.com", "to": ["jake@example.com"], "subject": "Quarterly Goals", "body": "Let's review the quarterly goals and set objectives for the next period.", "status": "unread" }
- Wait for another 15 mins and you will notice new writable backing index is created in datastream.
- Ingest more documents
POST _bulk
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "25", "date_time": "2023-11-14T12:00:00Z", "from": "jared@example.com", "to": ["karen@example.com"], "subject": "Project Milestones", "body": "As we approach our next project milestone, it's critical that we align our goals with the broader company strategy. I've attached the detailed report including our performance metrics, projected timelines, and resource allocation plans for the upcoming quarter. Please review and provide your detailed feedback on the same. Additionally, consider the impact of recent market changes on our deliverables.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "26", "date_time": "2023-11-14T12:10:00Z", "from": "kate@example.com", "to": ["leo@example.com"], "subject": "Design Team Meeting", "body": "This is a reminder to confirm your attendance at the design team meeting scheduled for Thursday. We'll be discussing the user feedback on our latest app design, as well as the revisions proposed by the UX team. It's imperative that all feedback is considered to ensure our design meets the user expectations and adheres to the best practices in accessibility and usability.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "27", "date_time": "2023-11-14T12:20:00Z", "from": "leon@example.com", "to": ["mia@example.com"], "subject": "Code Review", "body": "I've just completed a significant commit to our repository, addressing several critical bugs reported in the issue tracker. The commit also includes performance enhancements that should resolve the latency issues we've seen in the production environment. Please conduct a thorough review of the changes, paying particular attention to the refactoring of the payment processing module. Let's ensure we maintain our coding standards and avoid any regressions.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "28", "date_time": "2023-11-14T12:30:00Z", "from": "mia@example.com", "to": ["nate@example.com"], "subject": "Budget Forecast", "body": "The finance team has completed the budget forecast for the next fiscal quarter. The detailed forecast includes projections for revenue, expenditures, and cash flow, considering the new product launch and expected changes in operational costs. The attached spreadsheet provides a breakdown by department and includes a risk analysis section with mitigation strategies for potential overruns. Your insights on the forecast, especially in terms of marketing spend and R&D investment, would be highly valuable.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "29", "date_time": "2023-11-14T12:40:00Z", "from": "noel@example.com", "to": ["olivia@example.com"], "subject": "Client Onboarding", "body": "We're excited to welcome our new client to our service platform. The onboarding process is a crucial step in ensuring a smooth transition and a positive first impression. I have drafted an onboarding plan that outlines each phase of the process, including initial setup, training sessions, and ongoing support. The document also covers common challenges new clients face and how we can proactively address them. Please review the plan before our kickoff meeting and prepare any questions or suggestions you may have.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "30", "date_time": "2023-11-14T12:50:00Z", "from": "olivia@example.com", "to": ["peter@example.com"], "subject": "Website Downtime", "body": "Our website experienced unexpected downtime last night, which affected user access to our online services. The preliminary analysis points to a server overload, possibly related to the recent surge in traffic following our campaign launch. We need to convene a meeting with the IT and marketing departments to discuss this issue, understand the root cause, and develop a robust mitigation strategy to prevent future occurrences. It's critical that we maintain our service reliability and address any capacity concerns as we scale.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "31", "date_time": "2023-11-14T13:00:00Z", "from": "patrick@example.com", "to": ["quincy@example.com"], "subject": "Annual Report Compilation", "body": "The end of the year is approaching, and it's time to compile our annual report. This document is a comprehensive reflection of our company's financial health, achievements, and strategy moving forward. It will include insights from various departments, and I'll need your team to contribute a detailed analysis of this year's marketing campaigns, the outcomes, and learnings. Please ensure that the data is accurate, as this report will be shared with our stakeholders and will influence our strategy for the coming year.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "32", "date_time": "2023-11-14T13:10:00Z", "from": "quincy@example.com", "to": ["rachel@example.com"], "subject": "HR Policies Update", "body": "Our human resources department is updating several key company policies to reflect the latest in employment law and best practices within our industry. The updates will include changes to our remote work policy, benefits package, and employee conduct guidelines. I will be hosting a series of workshops to walk through the changes and gather feedback. Your participation is important as we want to ensure these policies are fair and clearly understood by all employees.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "33", "date_time": "2023-11-14T13:20:00Z", "from": "rachel@example.com", "to": ["steve@example.com"], "subject": "Expansion Strategy Meeting", "body": "As part of our strategic expansion plan, we are considering several new markets for entry in the upcoming year. Each market presents unique opportunities and challenges, and we must weigh these carefully. The attached document contains an overview of the market research findings, competitive analysis, and the proposed entry strategy for each region. Please review this material in detail ahead of our strategy meeting where we will make critical decisions about our international presence and investments.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "34", "date_time": "2023-11-14T13:30:00Z", "from": "steve@example.com", "to": ["tina@example.com"], "subject": "Product Launch Readiness", "body": "Our upcoming product launch is a significant milestone for the company, and it is imperative that we are fully prepared. This involves not just finalizing the product itself, but ensuring that all support systems are operational, the go-to-market strategy is solidified, and the sales and customer service teams are fully trained on the product's features and potential customer inquiries. The attached checklist outlines all the tasks that need to be completed, along with their current status. Please go through it to verify that nothing has been overlooked.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "35", "date_time": "2023-11-14T13:40:00Z", "from": "tina@example.com", "to": ["uma@example.com"], "subject": "Compliance Training Schedule", "body": "Compliance with industry regulations is not only essential for legal reasons but also for maintaining the trust of our customers and partners. I am finalizing the schedule for our annual compliance training, which will cover topics such as data protection, anti-corruption practices, and workplace safety. Your team's attendance is mandatory, as these trainings are crucial for ensuring everyone is aware of their responsibilities and the correct procedures. A calendar invite will follow shortly with the details.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "36", "date_time": "2023-11-14T13:50:00Z", "from": "uma@example.com", "to": ["victor@example.com"], "subject": "Infrastructure Upgrade Project", "body": "Our IT infrastructure is undergoing a significant upgrade to support our growing workforce and the increasing demand for our online services. This project will include the deployment of new servers, enhancement of network security protocols, and the introduction of advanced data analytics tools. The project plan outlines the key phases, deliverables, and responsible parties. Your role in this will be critical, particularly in ensuring minimal disruption to our services during the transition. Please review the project plan and prepare any questions for the upcoming project briefing.", "status": "unread" }
Wait another 15m to see the new writable index.
Add more documents
POST _bulk
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "37", "date_time": "2023-11-14T14:00:00Z", "from": "valerie@example.com", "to": ["wendy@example.com"], "subject": "Sustainability Initiatives Planning", "body": "As part of our commitment to sustainability, we're planning a series of initiatives to reduce our carbon footprint and enhance our environmental stewardship. These initiatives include investing in renewable energy sources, optimizing our supply chain for efficiency, and reducing waste in our offices. I've attached a preliminary plan that outlines our proposed actions and their potential impact. Your expertise in environmental sciences is crucial to refining this plan, so please provide detailed feedback, particularly regarding our targets for emissions reductions and energy savings.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "38", "date_time": "2023-11-14T14:10:00Z", "from": "xavier@example.com", "to": ["yvonne@example.com"], "subject": "New Technology Investments", "body": "Our technology infrastructure is a critical foundation for our continued growth and innovation. With the rapid pace of technological change, we must invest in new systems and tools to maintain our competitive edge. This includes upgrading our CRM software, adopting AI-driven analytics for customer insights, and enhancing our cybersecurity measures. The attached investment proposal details the costs, benefits, and expected ROI for each technology upgrade. Please review it carefully, as we will be discussing these potential investments in our next executive board meeting.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "39", "date_time": "2023-11-14T14:20:00Z", "from": "yolanda@example.com", "to": ["zachary@example.com"], "subject": "Quarterly Sales Review", "body": "This quarter's sales review shows a promising trend in several key markets, with a notable increase in sales volume compared to the last quarter. However, there are areas where we haven't met our targets, particularly in newer markets where brand recognition is still developing. The attached sales report includes a detailed analysis of our performance, with data segmented by product line and region. It's essential that we use this data to adjust our sales strategies, focusing on high-growth potential areas and addressing the challenges in underperforming markets.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "40", "date_time": "2023-11-14T14:30:00Z", "from": "zoe@example.com", "to": ["amelia@example.com"], "subject": "Content Marketing Strategy", "body": "Content marketing is a powerful tool to engage our audience and strengthen our brand presence. Our strategy for the upcoming year will focus on creating high-quality, informative content that resonates with our target customers. This includes a series of thought leadership articles, educational webinars, and interactive social media campaigns. The attached document outlines the content calendar, topics for each piece of content, and the distribution channels we will leverage. Please review the strategy and suggest any additional topics or trends we should consider incorporating.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "41", "date_time": "2023-11-14T14:40:00Z", "from": "amelia@example.com", "to": ["brad@example.com"], "subject": "Operational Efficiency Program", "body": "Operational efficiency is key to improving our margins and ensuring customer satisfaction. We are initiating a program to streamline our processes, eliminate redundancies, and automate routine tasks. The program will be implemented in phases, starting with our customer service operations, where we see significant potential for improvements. The attached plan provides an overview of the proposed changes, the technologies we will employ, and the impact on our workforce. Your role will be to oversee the implementation, ensuring that the transition is smooth and the program's goals are met.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "42", "date_time": "2023-11-14T14:50:00Z", "from": "brad@example.com", "to": ["cindy@example.com"], "subject": "Risk Management Update", "body": "Risk management is a priority for our company, particularly in the current economic climate. We have updated our risk management framework to better identify, assess, and mitigate risks that could impact our business. This includes financial risks, such as currency fluctuations and credit risks, as well as operational risks related to supply chain disruptions. The attached document includes a summary of the updated framework, a risk register with our top risks and their mitigation strategies, and a schedule for our regular risk assessment meetings. Please familiarize yourself with the updates before our next risk management committee meeting.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "43", "date_time": "2023-11-14T15:00:00Z", "from": "cindy@example.com", "to": ["derek@example.com"], "subject": "Employee Engagement Survey Results", "body": "The results from our recent employee engagement survey are in, and they provide valuable insights into how our workforce perceives their work environment and career prospects within our company. Overall, the results are positive, but there are areas where we can improve, particularly in terms of career development and work-life balance. The attached report includes a detailed analysis of the survey results, employee comments, and recommendations for action items. We will be discussing these findings in our upcoming HR meeting to develop a plan to address the areas of concern and capitalize on our strengths.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "44", "date_time": "2023-11-14T15:10:00Z", "from": "derek@example.com", "to": ["elaine@example.com"], "subject": "Customer Service Improvement Plan", "body": "Customer service is a cornerstone of our business, and we are always looking for ways to enhance the customer experience. Based on recent customer feedback and service metrics, we have developed an improvement plan that focuses on reducing response times, providing more comprehensive training to our service representatives, and implementing a new customer relationship management system. The attached plan includes the objectives, key performance indicators, and a timeline for each initiative. Please review the plan and prepare any questions or suggestions for our upcoming customer service improvement workshop.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "45", "date_time": "2023-11-14T15:20:00Z", "from": "elaine@example.com", "to": ["frank@example.com"], "subject": "Data Privacy Compliance", "body": "Data privacy regulations are evolving, and it's critical that we remain compliant to avoid potential fines and reputational damage. We are conducting a comprehensive review of our data handling practices to ensure they align with the latest GDPR and CCPA requirements. This includes assessing our data collection policies, reviewing our data storage and processing practices, and updating our privacy notices. The attached memo outlines the key areas of focus for the compliance review and the roles and responsibilities of each team member involved. Please ensure you are familiar with the latest data privacy regulations and understand how they apply to your work.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "46", "date_time": "2023-11-14T15:30:00Z", "from": "frank@example.com", "to": ["gina@example.com"], "subject": "Innovation Lab Projects", "body": "Our Innovation Lab is launching several exciting projects that aim to leverage cutting-edge technologies to solve business challenges and create new opportunities. These projects include developing a blockchain-based supply chain solution, experimenting with AR/VR for immersive customer experiences, and exploring the use of IoT devices for real-time data collection. The attached overview provides a brief description of each project, the objectives, and the expected outcomes. Your team's participation in these projects is crucial, as it will provide valuable expertise and resources needed to drive innovation forward.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "47", "date_time": "2023-11-14T15:40:00Z", "from": "gina@example.com", "to": ["hank@example.com"], "subject": "Leadership Development Program", "body": "We are committed to developing the next generation of leaders within our organization, and to that end, we are launching a Leadership Development Program. This program is designed to provide high-potential employees with the skills, knowledge, and experiences necessary to take on leadership roles in the future. The program includes mentorship opportunities, leadership workshops, and strategic project assignments. The attached brochure provides more information about the program structure, the application process, and the selection criteria. Please distribute this to your team and encourage eligible candidates to apply.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "48", "date_time": "2023-11-14T15:50:00Z", "from": "hank@example.com", "to": ["ida@example.com"], "subject": "Global Market Expansion Analysis", "body": "Expanding into global markets is a significant endeavor that requires careful planning and analysis. Our market research team has conducted a comprehensive analysis of potential markets for expansion, considering factors such as economic stability, consumer trends, and regulatory environment. The attached report summarizes the findings of this analysis, including the recommended markets for entry and the proposed strategies for each. It is imperative that we discuss these recommendations in detail to ensure that our approach to global expansion is strategic and well-informed.", "status": "unread" }
{ "create" : { "_index" : "email_conversations" } }
{ "message_id": "49", "date_time": "2023-11-14T16:00:00Z", "from": "ida@example.com", "to": ["jack@example.com"], "subject": "Cybersecurity Protocol Update", "body": "In light of recent cybersecurity threats, we have updated our cybersecurity protocols to further protect our company's digital assets. This includes the implementation of multi-factor authentication, regular security audits, and employee training on recognizing and responding to cyber threats. The attached document outlines the updated protocols, the rationale behind the changes, and the steps each employee must take to comply with the new security measures. Please review this document carefully and ensure that you understand your role in maintaining the security of our systems and data.", "status": "unread" }
Wait another 15m to see new backing writable index. Once we reach 100kb total size we should see the old backing indexes will be transitioning to the warm storage.
Step 4: Searching across the backing index
-
Create index pattern on alias name
company_email_records
-
Navigate to Discover section, Select
company_email_records
and Search for email idalice@example.com
notice you see the results from both 2 backing indexes who has the emails matching
To clean up everything, you'll need to delete the data stream, index template, and any indices that were created during your demo. Here's how you can do it:
Please note that this operation is destructive and irreversible. Ensure that you indeed want to remove these resources before proceeding.
-
Delete Data Stream: If you created a data stream, you can delete it with the following command:
DELETE _data_stream/email_conversations
Output should be similar to
200-ok
:{ "acknowledged": true }
-
Delete Index Template: Remove the index template that you created for the data stream:
DELETE _index_template/email_conversations_data_template
Output should be similar to
200-ok
:{ "acknowledged": true }
-
Delete Indices: If you want to delete all indices that match the pattern
email_conversations*
, you can run:DELETE /email_conversations*
Output should be similar to
200-ok
:{ "acknowledged": true }
And if you created any indices manually or as a result of the rollover, delete them as needed:
DELETE /<name-of-the-index>
Replace
<name-of-the-index>
with the actual index name. -
Delete ISM Policy To delete an ISM policy named rollover_then_transition, use the following script:
DELETE _plugins/_ism/policies/rollover_then_transition
You should see below output:
{ "_index": ".opendistro-ism-config", "_id": "rollover_then_transition", "_version": 2, "result": "deleted", "forced_refresh": true, "_shards": { "total": 3, "successful": 3, "failed": 0 }, "_seq_no": 5049, "_primary_term": 2 }
-
Remove Alias: If you managed to create an alias, you should delete it as well:
POST /_aliases { "actions": [ { "remove": { "index": "*", "alias": "company_email_records" } } ] }
-
Reset ISM Policy Settings (if changed): If you changed any cluster settings such as the ISM sweep period, reset them to their default or previous values:
PUT _cluster/settings { "persistent": { "plugins.index_state_management.coordinator.sweep_period": null } }
After running these commands, all components related to the demo should be removed from your OpenSearch cluster. Make sure to adjust any index names or patterns to match those that you've actually used.
Choose the right storage tier for your needs in Amazon OpenSearch Service
- The below command will list all the indices on your domain along with its size.
GET _cat/indices
2. Let’s start creating an index and adding some documents to it:
POST fruits/_doc
{
"name":"banana",
"color":"yellow",
"@timestamp": "2023-10-30"
}
The output should look similar to:
- Use the following command to check the index settings
GET fruits/_settings
The output should look similar to:
{
"fruits": {
"settings": {
"index": {
"creation_date": "1698681359180",
"number_of_shards": "5",
"number_of_replicas": "2",
"uuid": "X57RMr11RmCgOkZnQxaWLg",
"version": {
"created": "136307827"
},
"provided_name": "fruits"
}
}
}
}
- Migrate the index to UltraWarm node from hot storage using the below command:
POST _ultrawarm/migration/fruits/_warm
10,000 documents average size 5mb, took 126 milli second to move hot to warm.
You should get a response with return code 200 - success and an acknowledgement as below:
- When the migration is happening from ‘Hot’ to ‘Warm’ you can check the migration status as below.
GET _ultrawarm/migration/fruits/_status
You should see an output as mentioned below
{
"migration_status" : {
"index" : "fruits",
"state" : "PENDING_SHARD_RELOCATION",
"migration_type" : "HOT_TO_WARM"
}
}
Be aware that the sample data is very small and index migration can happen very quickly. If index migration is already over, you should see an output indicating that there are no active migrations for the index specified.
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Index [fruits] has no active migrations"
}
],
"type": "illegal_argument_exception",
"reason": "Index [fruits] has no active migrations"
},
"status": 400
}
- Execute the command to display index settings again to validate whether index is migrated to Ultrawatm.
GET fruits/_settings
- Run the following script to list the indices in ‘Hot’ and ‘Warm’
GET _cat/indices/_hot
GET _cat/indices/_warm
You will notice that the index opensearch_dashboards_sample_data_logs
is listed under warm and rest of the indices are listed under hot’
When you migrate indexes to cold storage, you provide a time range for the data to make discovery easier. You can select a timestamp field based on the data in your index, manually provide a start and end timestamp, or choose to not specify one. In this demo I will use timestamp_field value as “@timestamp”.
- Migrate the index to cold from UltraWarm node using the below command:
POST _ultrawarm/migration/fruits/_cold
{
"timestamp_field": "@timestamp"
}
You should get a response with return code 200 - success and an acknowledgement as below:
{
"acknowledged" : true
}
Average document size 5mb, 10,000 documents took 217ms to move warm to cold storage.
- When the migration is happening from ‘Warm’ to ‘Cold’ you can check the migration status as below same as we did while moving ‘Hot’ to ‘Warm’ and you should see an output as mentioned in previous steps.
GET _ultrawarm/migration/fruits/_status
- Go to the index management section, and you should find the 'fruits' index listed under the cold indices section.
- Execute below command to list all cold indices
GET _cold/indices/_search
you should see output similar to:
- You can add time filter to search cold indices.
GET _cold/indices/_search
{
"filters": {
"time_range": {
"start_time": "2023-10-01",
"end_time": "2023-11-01"
}
}
}
You should see output similar to:
When you need to query cold data, you can selectively attach it to existing UltraWarm nodes. Even if you want to delete documents from the cold storage index, you must move index to warm and then delete. Let’s move our index back to warm storage manually.
- Execute below command to move fruits to warm storage.
POST _cold/migration/_warm
{
"indices": "fruits"
}
128ml to move from cold to ultrawarm, 10,000 documents of 5mb each average size.
Once index moved to warm storage, you can search like normal index.
- Execute below command to search on fruits index in warm storage. You would search like you do on hot storage. Searching on warm index is no different
GET fruits/_search
You should see output similar to:
If your migration to cold storage is queued or in a failed state, you can cancel the migration using the following request:
- Execute below command to cancel your cold storage migration process.
POST _ultrawarm/migration/_cancel/fruits
- Execute, below command to delete fruits index from cold storage.
DELETE _cold/fruits
- Datastream: https://opensearch.org/docs/latest/dashboards/im-dashboards/datastream/
- Index State Manaagement in Amazon OpenSearch Service: https://docs.aws.amazon.com/opensearch-service/latest/developerguide/ism.html
- ISM Policy: https://opensearch.org/docs/latest/im-plugin/ism/api/
- UltraWarm Storage in Amazon OpenSearch: https://docs.aws.amazon.com/opensearch-service/latest/developerguide/ultrawarm.html
- Cold Storage in Amzon OpenSearch: https://docs.aws.amazon.com/opensearch-service/latest/developerguide/cold-storage.html
- Rollover ISM Policy: https://opensearch.org/docs/latest/im-plugin/ism/policies/#rollover