Skip to content

Instantly share code, notes, and snippets.

@rupeshtiwari
Last active March 8, 2024 21:42
Show Gist options
  • Save rupeshtiwari/25475943d6a808ea8895d65504852fe6 to your computer and use it in GitHub Desktop.
Save rupeshtiwari/25475943d6a808ea8895d65504852fe6 to your computer and use it in GitHub Desktop.
indexing performance in opensearch

Optimize your Elasticsearch/OpenSearch indexing performance with these key adjustments:

  1. Java Heap Size:

    • Default: Varies.
    • Recommended: 50% of RAM.
    • Example: For 32GB RAM, set heap size to 16GB.
  2. Flush Translog Threshold:

    • Default: 512MB.
    • Recommended: Increase to 25% of Java heap.
    • Example: For a 16GB heap, set to 4GB.
  3. Index Refresh Interval:

    • Default: 1s.
    • Recommended: Increase during heavy indexing. Disable or set to 30s.
    • Example: "index.refresh_interval": "30s".
  4. Index Buffer Size:

    • Default: 10% of JVM memory.
    • Recommended: Increase to up to 25% for heavy indexing.
    • Example: For a 16GB heap, up to 4GB.
  5. Concurrent Merges (max_merge_count):

    • Default: Varies.
    • Recommended: Increase if experiencing index throttling.
    • Example: "index.merge.scheduler.max_merge_count": 6.
  6. Shard Distribution:

    • Formula: Number of shards = k * (Number of data nodes).
    • Example: For 8 nodes, with k=3, ensure 24 shards in the index.
  7. Setting Replica Count to Zero:

    • Concern: Potential data loss during node failures.
    • Example: "index.number_of_replicas": 0 during heavy indexing, revert post-indexing.
  8. Optimal Bulk Request Size:

    • Start: 5 MiB to 15 MiB.
    • Adjust until no further performance gain.
  9. Instance Type with SSD:

    • Use SSD-backed instances (e.g., AWS I3) for superior ingestion performance.
  10. Reduce Response Size:

    • Use filter_path to limit response data.
    • Example: ?filter_path=-took,-items.*._index.
  11. Compression Codecs (OpenSearch 2.9+):

    • Default: LZ4.
    • Recommended: zstd or zstd_no_dict for up to 14% better throughput and 30% storage efficiency.
    • Example: "index.codec": "zstd".

Each adjustment aims to balance performance with operational safety. Monitor impacts closely, especially when modifying settings like translog flush thresholds and replica counts.

Reference

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment