Skip to content

Instantly share code, notes, and snippets.

@epcim
Last active September 27, 2021 10:43
Show Gist options
  • Save epcim/6b0a990f640fe5bf226d9f73c38fde50 to your computer and use it in GitHub Desktop.
Save epcim/6b0a990f640fe5bf226d9f73c38fde50 to your computer and use it in GitHub Desktop.
elasticsearch troubleshooting elastic

Some docs:

HEALTH DETAILS

curl -XGET http://$log:9200/_cluster/health?pretty

LIST INDEXES

 k -n sre exec -it elasticsearch-0 -c elasticsearch -- curl -s http://localhost:9200/_cat/indices?v |grep kube

ROLLOVER

k -n sre exec -it elasticsearch-0 -c elasticsearch -- curl -X POST -H 'Content-Type: application/json' -d '{"conditions":{"max_docs":1}}' localhost:9200/fluentd.kube.falco/_rollover

k -n sre exec -it elasticsearch-0 -c elasticsearch -- curl -s http://localhost:9200/_cat/aliases?v | awk '{print $1}'|  egrep -v ^.kibana\|^ilm\|^alias | xargs -n1 -I% curl -X POST -H 'Content-Type: application/json' -d '{"conditions":{"max_docs":1}}'  http://localhost:9200/%/_rollover

NODE DETAIS

curl -s http://$log:9200/_cat/nodes?v

ip         heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
10.20.0.61           56          96   9    1.50    0.98     0.49 mdi       *      elasticsearch
10.20.0.62           61          93  10    0.40    0.87     0.63 mdi       -      elasticsearch

WHAT ARE THE CUSTOM SETTINGS?

curl -XGET http://$log:9200/_cluster/settings?pretty

WHAT ARE THE CUSTOM SETTINGS?

curl -XGET http://$log:9200/_cluster/settings?pretty
{
"persistent" : { },
"transient" : { }
}

SETUP WATERMARK DISK SPACE TO 90%

curl -XPUT http://$log:9200/_cluster/settings?pretty -H 'Content-Type: application/json' -d '{"transient": {"cluster.routing.allocation.disk.watermark.low": "90%"}}'

WHO IS MASTER

curl -X GET http://$log:9200/_cat/master

SETTINGS

curl -XGET http://$log:9200/_cluster/settings?pretty

INDEXES

curl -s http://$log:9200/_cat/indices?v | grep -v green

REALLOCATE/REROUTEE ALL SHARDS WHICH ARE UNASSIGNED

# retry rerouting unassigned shards
curl -XPOST -ss localhost:9200/_cluster/reroute?retry_failed=true

# realocate
curl -XPUT http://$log:9200/_cluster/settings?pretty -H 'Content-Type: application/json' -d'{"persistent": {"cluster.routing.allocation.enable": "all"}}'

If some shards won't reallocate, do following:

curl -XGET http://$log:9200/_cluster/nodes?pretty
curl -s http://localhost:9200/_nodes?pretty
curl -XPOST http://$log:9200/_cluster/reroute -d '{ "commands" :  [ { "allocate" : { "index" : "log-2018.01.07", "shard" : 0, "node": "l3iTL89aRSONcMwqoZ38Zw", "allow_primary": "true" }}]}'

Get node unique name

curl -XGET http://$log:9200/_nodes?pretty | grep transport_address -B 3

Put some config

curl -X PUT http://$log:9200/test_idx/_settings  -H 'Content-Type: application/json' -d '{ "index.routing.allocation.exclude._name": null }' |jq

Delete index manually

curl -XDELETE $log:9200/index/fluentd.kube.app.ver.vega-*

Troubleshooting

Get index statistics

To print docs count by index (sum of fluentd daily indexes):

curl -sS localhost:9200/_cat/indices?v|awk '{print $3,$7}'|sed 's,^.*-\([0-9]*.[0-9]*.[0-9]*\),\1,g'|awk '{sum[$1]+=$2;}END{for (i in sum) print i,sum[i]}'|sort

To print index sizes in MB (sum of fluentd daily indexes):

curl -sS "localhost:9200/_cat/indices?v&bytes=b"|awk '{print $3,$9}'|sed 's,^.*-\([0-9]*.[0-9]*.[0-9]*\),\1,g'|awk '{sum[$1]+=$2;}END{for (i in sum) print i,sum[i]/1024/1024}'|sort

View recovery status of initializing shards

To view process of shards recovery:

curl -sS "localhost:9200/_cat/recovery?pretty&active_only=true"

You can also view allocation and recovery information for given index:

curl -sS localhost:9200/fluentd.svcfw.apiaccess-2020.04.05-000001/allocation/explain?pretty
curl -sS "localhost:9200/fluentd.svcfw.apiaccess-2020.04.05-000001/_recovery?pretty"

Cleanup unassigned shards

You can then list unassigned shards with:

curl -sS localhost:9200/_cat/shards | grep UNASSIGNED | awk '{print $1}'

To describe why shards cannot be allocated:

curl -sS localhost:9200/_cluster/allocation/explain?pretty
curl -sS http://localhost:9200/_cat/shards | grep UNASSIGNED | awk '{print $1}' | xargs -i curl -XDELETE "http://localhost:9200/{}"

Check ILM status

You can check ILM for single index or iterate over all indexes:

curl -sS http://localhost:9200/_cat/indices?h=i | while read i; do
    curl -sS http://localhost:9200/$i/_ilm/status?pretty
done

index.lifecycle.rollover_alias does not point to index

If you see error similar to

index.lifecycle.rollover_alias [fluentd.kube] does not point to index [fluentd.kube-2019.11.08-000001]

It means that alias does not point to index you are trying to roll over. This state won't be fixed by itself and you need to fix alias first and then retry ILM.

First you need to check that name of alias is not used as an index already. In that case, you need to delete such index otherwise you will not be able to create alias.

# Check existing indexes
curl -sS localhost:9200/_cat/indices | grep fluentd.kube

# Delete index if it exists
curl -X DELETE localhost:9200/fluentd.kube
# Point fluentd.kube alias to last index seen (fluentd.kube-2019.11.08-000001)
curl -sS -XPOST -H 'Content-Type: application/json' -d '{"actions": [{"add":{"index":"fluentd.kube-2019.11.08-000001","alias":"fluentd.kube"}}]}' localhost:9200/_aliases

# Retry ILM for given index
curl -sS -XPOST localhost:9200/fluentd.kube-2019.11.08-000001/_ilm/retry?pretty

FORBIDDEN/12/index read-only / allow delete (api)

Index can become read-only (with disallowed delete operation) in certain situations, eg. when elastic will go out of disk space, etc. Then it's necessary to fix index configuration and allow deletion again (read_only_allow_delete).

This behavior os further explained in Elasticsearch documentation:

cluster.routing.allocation.disk.watermark.flood_stage: Controls the flood stage watermark. It defaults to 95%, meaning that Elasticsearch enforces a read-only index block (index.blocks.read_only_allow_delete) on every index that has one or more shards allocated on the node that has at least one disk exceeding the flood stage. This is a last resort to prevent nodes from running out of disk space. The index block must be released manually once there is enough disk space available to allow indexing operations to continue.

To fix it, check storage space and unset index.blocks.read_only_allow_delete. This is possible in Kibana or with curl:

Reroute unassigned

```sh
for i in in $(curl -sS -XGET http://localhost:9200/_cat/shards | grep UNASSIGNED | awk '{print $1}' | xargs -n1); do curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
        "commands" : [ {
              "allocate" : {
                  "index" : "$i", 
                  "shard" : 0, 
                  "node" : "elasticsearch-0", 
                  "allow_primary" : true
              }
            }
        ]
    }' -H 'Content-Type: application/json' ; sleep 5; done
```

Slow delete unassigned

set -x
for i in `curl -sS http://localhost:9200/_cat/shards | grep UNASSIGNED | awk '{print $1}' | xargs -n1`; do curl -XDELETE "http://localhost:9200/$i"& sleep 10; done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment