Skip to content

Instantly share code, notes, and snippets.

@denzhel
Created July 18, 2022 19:08
Show Gist options
  • Save denzhel/b415f5bf623753bc0cf61ada32232864 to your computer and use it in GitHub Desktop.
Save denzhel/b415f5bf623753bc0cf61ada32232864 to your computer and use it in GitHub Desktop.
elasticsearch outage #1
  1. I’ve identified we have a problem in one of our elasticsearch clusters -> it reached the low water mark - 87%
  2. Shards are not allocated to the nodes who passed the water mark - a couple of them
  3. I’ve increased water marks:
  • low - 95%
  • high - 97%
  • flood - 99%
  1. In the meantime we discussed on our plan and strategy and decided to scale out the cluster by 3 nodes
  2. I’ve created 3 nodes using Terraform
  3. I’ve deployed elasticsearch one by one using Ansible
  4. Decreased the water marks back to default to encourage the cluster to rebalance the shards and spread the data across the nodes
  5. One of the developers has reached out saying that he’s getting multiple errors in one of the services:
[FORBIDDEN/12/index read-only]
  1. I’ve googled the error and found this: https://selleo.com/til/posts/esrgfyxjee-how-to-fix-elasticsearch-forbidden12index-read-only TLDR - once a node reaches the flood watermark, the one I restored to default(95%), all indices on the nodes become locked and read only.
  2. SHIT - at that point I realized that multiple nodes reached 95% already
  3. I increased the flood water mark to 99 %:
curl -XPUT 'http://localhost:9200/_cluster/settings' -H 'Content-Type: application/json' -d '{ "transient" : { "cluster.routing.allocation.disk.watermark.flood_stage" : "99%" } }'
  1. Used the command from the post I found and unlocked all locked indices:
curl -XPUT -H "Content-Type: application/json" http://localhost:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'

Lags decreased and shards started spreading across the cluster slowly. ALL GOOD, GOOD VIBES

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment