epcim/elastic_health.md

## elastic_health.md

      
    Raw
  

              elastic_health.md
            
          
    Some docs:

https://www.elastic.co/blog/red-elasticsearch-cluster-panic-no-longer
https://www.elastic.co/guide/en/elasticsearch/reference/current/restart-cluster.html

HEALTH DETAILS
curl -XGET http://$log:9200/_cluster/health?pretty

LIST INDEXES
 k -n sre exec -it elasticsearch-0 -c elasticsearch -- curl -s http://localhost:9200/_cat/indices?v |grep kube

ROLLOVER
k -n sre exec -it elasticsearch-0 -c elasticsearch -- curl -X POST -H 'Content-Type: application/json' -d '{"conditions":{"max_docs":1}}' localhost:9200/fluentd.kube.falco/_rollover

k -n sre exec -it elasticsearch-0 -c elasticsearch -- curl -s http://localhost:9200/_cat/aliases?v | awk '{print $1}'|  egrep -v ^.kibana\|^ilm\|^alias | xargs -n1 -I% curl -X POST -H 'Content-Type: application/json' -d '{"conditions":{"max_docs":1}}'  http://localhost:9200/%/_rollover

NODE DETAIS
curl -s http://$log:9200/_cat/nodes?v

ip         heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
10.20.0.61           56          96   9    1.50    0.98     0.49 mdi       *      elasticsearch
10.20.0.62           61          93  10    0.40    0.87     0.63 mdi       -      elasticsearch

WHAT ARE THE CUSTOM SETTINGS?
curl -XGET http://$log:9200/_cluster/settings?pretty

WHAT ARE THE CUSTOM SETTINGS?
curl -XGET http://$log:9200/_cluster/settings?pretty
{
"persistent" : { },
"transient" : { }
}

SETUP WATERMARK DISK SPACE TO 90%
curl -XPUT http://$log:9200/_cluster/settings?pretty -H 'Content-Type: application/json' -d '{"transient": {"cluster.routing.allocation.disk.watermark.low": "90%"}}'

WHO IS MASTER
curl -X GET http://$log:9200/_cat/master

SETTINGS
curl -XGET http://$log:9200/_cluster/settings?pretty

INDEXES
curl -s http://$log:9200/_cat/indices?v | grep -v green

REALLOCATE/REROUTEE ALL SHARDS WHICH ARE UNASSIGNED
# retry rerouting unassigned shards
curl -XPOST -ss localhost:9200/_cluster/reroute?retry_failed=true

# realocate
curl -XPUT http://$log:9200/_cluster/settings?pretty -H 'Content-Type: application/json' -d'{"persistent": {"cluster.routing.allocation.enable": "all"}}'

If some shards won't reallocate, do following:
curl -XGET http://$log:9200/_cluster/nodes?pretty
curl -s http://localhost:9200/_nodes?pretty
curl -XPOST http://$log:9200/_cluster/reroute -d '{ "commands" :  [ { "allocate" : { "index" : "log-2018.01.07", "shard" : 0, "node": "l3iTL89aRSONcMwqoZ38Zw", "allow_primary": "true" }}]}'

Get node unique name
curl -XGET http://$log:9200/_nodes?pretty | grep transport_address -B 3

Put some config
curl -X PUT http://$log:9200/test_idx/_settings  -H 'Content-Type: application/json' -d '{ "index.routing.allocation.exclude._name": null }' |jq

Delete index manually
curl -XDELETE $log:9200/index/fluentd.kube.app.ver.vega-*

Troubleshooting

Get index statistics

To print docs count by index (sum of fluentd daily indexes):
curl -sS localhost:9200/_cat/indices?v|awk '{print $3,$7}'|sed 's,^.*-\([0-9]*.[0-9]*.[0-9]*\),\1,g'|awk '{sum[$1]+=$2;}END{for (i in sum) print i,sum[i]}'|sort
To print index sizes in MB (sum of fluentd daily indexes):
curl -sS "localhost:9200/_cat/indices?v&bytes=b"|awk '{print $3,$9}'|sed 's,^.*-\([0-9]*.[0-9]*.[0-9]*\),\1,g'|awk '{sum[$1]+=$2;}END{for (i in sum) print i,sum[i]/1024/1024}'|sort
View recovery status of initializing shards

To view process of shards recovery:
curl -sS "localhost:9200/_cat/recovery?pretty&active_only=true"
You can also view allocation and recovery information for given index:
curl -sS localhost:9200/fluentd.svcfw.apiaccess-2020.04.05-000001/allocation/explain?pretty
curl -sS "localhost:9200/fluentd.svcfw.apiaccess-2020.04.05-000001/_recovery?pretty"
Cleanup unassigned shards

You can then list unassigned shards with:
curl -sS localhost:9200/_cat/shards | grep UNASSIGNED | awk '{print $1}'
To describe why shards cannot be allocated:
curl -sS localhost:9200/_cluster/allocation/explain?pretty
curl -sS http://localhost:9200/_cat/shards | grep UNASSIGNED | awk '{print $1}' | xargs -i curl -XDELETE "http://localhost:9200/{}"
Check ILM status

You can check ILM for single index or iterate over all indexes:
curl -sS http://localhost:9200/_cat/indices?h=i | while read i; do
    curl -sS http://localhost:9200/$i/_ilm/status?pretty
done
index.lifecycle.rollover_alias does not point to index

If you see error similar to
index.lifecycle.rollover_alias [fluentd.kube] does not point to index [fluentd.kube-2019.11.08-000001]

It means that alias does not point to index you are trying to roll over. This
state won't be fixed by itself and you need to fix alias first and then retry
ILM.
First you need to check that name of alias is not used as an index already. In that case, you need to delete such index otherwise you will not be able to create alias.
# Check existing indexes
curl -sS localhost:9200/_cat/indices | grep fluentd.kube

# Delete index if it exists
curl -X DELETE localhost:9200/fluentd.kube
# Point fluentd.kube alias to last index seen (fluentd.kube-2019.11.08-000001)
curl -sS -XPOST -H 'Content-Type: application/json' -d '{"actions": [{"add":{"index":"fluentd.kube-2019.11.08-000001","alias":"fluentd.kube"}}]}' localhost:9200/_aliases

# Retry ILM for given index
curl -sS -XPOST localhost:9200/fluentd.kube-2019.11.08-000001/_ilm/retry?pretty
FORBIDDEN/12/index read-only / allow delete (api)

Index can become read-only (with disallowed delete operation) in certain
situations, eg. when elastic will go out of disk space, etc. Then it's
necessary to fix index configuration and allow deletion again
(read_only_allow_delete).
This behavior os further explained in Elasticsearch documentation:
cluster.routing.allocation.disk.watermark.flood_stage: Controls the flood
stage watermark. It defaults to 95%, meaning that Elasticsearch enforces a
read-only index block (index.blocks.read_only_allow_delete) on every index
that has one or more shards allocated on the node that has at least one disk
exceeding the flood stage. This is a last resort to prevent nodes from running
out of disk space.  The index block must be released manually once there is
enough disk space available to allow indexing operations to continue.
To fix it, check storage space and unset
index.blocks.read_only_allow_delete. This is possible in Kibana or with
curl:
Reroute unassigned

```sh
for i in in $(curl -sS -XGET http://localhost:9200/_cat/shards | grep UNASSIGNED | awk '{print $1}' | xargs -n1); do curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
        "commands" : [ {
              "allocate" : {
                  "index" : "$i", 
                  "shard" : 0, 
                  "node" : "elasticsearch-0", 
                  "allow_primary" : true
              }
            }
        ]
    }' -H 'Content-Type: application/json' ; sleep 5; done
```

Slow delete unassigned

set -x
for i in `curl -sS http://localhost:9200/_cat/shards | grep UNASSIGNED | awk '{print $1}' | xargs -n1`; do curl -XDELETE "http://localhost:9200/$i"& sleep 10; done