Created
September 22, 2025 13:55
-
-
Save seanstory/a280a85d067e61bfeb5911bf2654e6e2 to your computer and use it in GitHub Desktop.
Building off of https://gist.github.com/seanstory/a08db2e149897da656db3a1ca72e17ac, this prompt introduces “hard negatives” into the sample dataset.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| You are an expert in Elasticsearch data modeling and a seasoned data architect. Your primary task is to generate two or three new Elasticsearch index mappings that are **similar but not identical** to a provided input index mapping. These newly generated indices are intended to create a more challenging evaluation scenario for a Natural Language to Elasticsearch Query Language (NL2ESQL) agent that needs to identify the correct index and fields from a user's query. Therefore, the similarity should be high, but with clear, plausible distinctions. **Input for this Task:** Here is the existing Elasticsearch index mapping as a JSON object: | |
| ``` | |
| {{input_index_mapping_json}} | |
| ``` | |
| INSTRUCTIONS: Your Goal for Each Generated Index: | |
| For each new index mapping you generate (either 1 or 2, as requested or as you determine best for creating variety): | |
| Maintain High Similarity in Domain and Purpose: | |
| The new index must clearly belong to the same general domain (e.g., e-commerce, healthcare logs, document management) as the input index. | |
| It should serve a very similar overall purpose. For example, if the input is a product catalog, the new index should also be a product catalog, perhaps for a slightly different product line, a different region, or a v2 of the service with minor schema changes. | |
| The `index_briefing` and the index-level `mappings._meta.description` (note: _meta for index) of the new index should reflect this strong similarity, while also hinting at any subtle differences. Remember that `_meta.description` fields must be very concise (max 50 characters, approx. 5-8 words). | |
| The `index_metadata.domain` should typically remain the same or be very closely related. | |
| Introduce Subtle but Clear Distinctions: The new index must not be an exact copy of the input index. Introduce realistic variations such as: | |
| Field Names: For some fields, use slightly different but semantically related names (e.g., input has product_id, new has item_sku; input has publish_date, new has publication_timestamp). Don't change all names, just a few key ones to create ambiguity for an identification task. | |
| Field Set: Make minor modifications to the set of fields. For instance: | |
| Add 1-2 plausible new fields that fit the domain. | |
| Omit 1-2 less critical fields from the original set. | |
| Replace a field with a closely related alternative (e.g., author_name replaced by contributor_list). | |
| Field Details & Patterns: You can introduce slight variations in how patterns (like copy_to or multi-fields) are applied, as long as the overall guidelines (see "Adherence to Base Generation Prompt Guidelines" below) are still met. For example, a different text field might get the copy_to treatment for semantic enhancement, or an additional field might benefit from a text/keyword multi-field. | |
| meta Descriptions: Field-level meta.description entries must be updated to accurately reflect any changes in field names, purposes, or nuances and must remain very concise (max 50 characters, approx. 5-8 words). E.g., "User item rating; 1-5 scale." | |
| Unique index_id: The index_id for each newly generated index must be unique and different from the input index's ID and from each other if you generate two. For example, if the input is products_v1, new ones could be products_v1_b or products_v2_experimental. | |
| Ensure Plausibility: All variations must be realistic and make sense within the context of the given domain. The goal is to create indices that could believably exist alongside the original, perhaps representing different versions, data sources, or slightly different microservices. | |
| Adherence to Base Generation Prompt Guidelines: | |
| This is crucial: All newly generated index mappings must fully adhere to the core requirements and best practices outlined in the original base generation prompt. For clarity, these key guidelines include: | |
| Overall Structure & Naming: | |
| index_id: Unique, descriptive, following common conventions (e.g., snake_case, including purpose/dataset). | |
| index_briefing: A narrative string explaining the index's purpose, data, and query capabilities. | |
| mappings._meta.description: Very concise description (max 50 chars) of the index's purpose/domain. | |
| mappings.properties: Contains field definitions. | |
| Field Naming: Predominantly snake_case, intuitive, with meaningful suffixes where appropriate (_at, _id, _raw, etc.). | |
| Field meta.description: Very clear and succinct description (max 50 chars) for every field/sub-field. | |
| index_metadata: Include domain, version, data_source_example. | |
| Field Types & Features: | |
| Supported types for search: text, semantic_text. | |
| Supported types for filtering: boolean, keyword, number, date. | |
| semantic_text fields: | |
| Must include an inference_id property directly within the field definition (e.g., "inference_id": "my_elser_model_endpoint"). Its meta.description should be a concise note about its purpose (max 50 chars). | |
| All generated indices must include at least one semantic_text field. | |
| text fields: All generated indices must include at least one text field. | |
| Searchable Field Density: Each index should have 2-4 fields suitable for various search operations. | |
| Selective copy_to: Use copy_to selectively for a few relevant text fields to corresponding semantic_text fields (demonstrate in at least one of your generated similar indices if the input index demonstrates it, or if it makes sense for the variation). A semantic_text field can also contain original text. | |
| Multi-Fields (text/keyword): Incorporate the fields: { "raw": { "type": "keyword" } } pattern for relevant string fields in at least one or two of the generated similar indices to add realism. | |
| General Constraints: | |
| Each index should have 8-20 fields overall. | |
| The index should be designed to support search/exploratory NLQs. | |
| Output Specification: | |
| Generate one OR two new Elasticsearch index mappings based on the provided {INPUT_INDEX_MAPPING_JSON}. | |
| The output must be in JSONL format. Each newly generated index mapping must be a separate, self-contained JSON object on a new line. | |
| Do NOT wrap the output in a single JSON array or use commas between the JSON objects on separate lines. | |
| Example of how to structure your thinking for one generated similar index: | |
| Receive INPUT_INDEX_MAPPING_JSON. | |
| Analyze its index_id, domain, index_briefing, fields, _meta/meta objects and patterns used. | |
| Decide on a plausible variation (e.g., "This will be version 2 of the same service, with one new tracking field, one deprecated field, and slightly updated naming for the main content field. It will still use ELSER for semantics on the main content."). | |
| Create a new, unique index_id. | |
| Adapt index_briefing and mappings._meta.description (ensure description is max 50 chars). | |
| Iterate through fields: | |
| Keep most fields the same or make subtle name changes. | |
| Implement the decided additions/omissions/replacements. | |
| Ensure field meta.description entries are updated and concise (max 50 chars). | |
| Ensure semantic_text fields have inference_id. | |
| Ensure copy_to and multi-field patterns are applied correctly and meet the base prompt's rules (e.g., demonstrate if required, don't overdo copy_to). | |
| Verify all constraints from the "Adherence to Base Generation Prompt Guidelines" section are met. | |
| Format as a single JSON object. | |
| If generating a second and third similar index, repeat for a different plausible variation. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment