fpytloun/elasticsearch-operations.md

## elasticsearch-operations.md

      
    Raw
  

              elasticsearch-operations.md
            
          
    Elasticsearch Operations


Operation tasks

Migrate indexes between clusters
Change number of index replicas


Troubleshooting

Elasticsearch

Check status
Get index statistics
Cleanup unassigned shards
Check ILM status
index.lifecycle.rollover_alias does not point to index
FORBIDDEN/12/index read-only / allow delete (api)
Trigger cronjob for ES currator manually


Operation tasks

Migrate indexes between clusters

First you need to whitelist old cluster on new one:
    - name: reindex.remote.whitelist
        value: "elasticsearch-0.elasticsearch.sre.svc.cluster.local:9200"

Then you can migrate single index using reindex API:
i=".kibana"; curl -X POST localhost:9200/_reindex -H'Content-Type: application/json' -d"{\"source\":{\"remote\":{\"host\":\"http://elasticsearch-0.elasticsearch.sre.svc.cluster.local:9200\"},\"index\":\"$i\"},\"dest\":{\"index\":\"$i\"}}"

Or all indexes like this:
N=9; curl elasticsearch-0.elasticsearch.sre.svc.cluster.local:9200/_cat/indices?h=i | while read index; do ((i=i%N)); ((i++==0)) && wait; echo "== Migrating $index"; curl -X POST localhost:9200/_reindex -H'Content-Type: application/json' -d"{\"source\":{\"remote\":{\"host\":\"http://elasticsearch-0.elasticsearch.sre.svc.cluster.local:9200\"},\"index\":\"$index\"},\"dest\":{\"index\":\"$index\"}}" & done

Change number of index replicas

To change number of replicas to 0 for all fluentd indexes you can run this
command:
curl localhost:9200/_cat/indices?h=i|grep fluentd\.|while read i; do curl -XPUT -H 'Content-Type: application/json' -d '{"index.number_of_replicas": 0}' localhost:9200/$i/_settings; done

Troubleshooting

Elasticsearch

Additional resources:

https://gist.github.com/epcim/6b0a990f640fe5bf226d9f73c38fde50

Check status

curl -sS localhost:9200/_cluster/health?pretty
Check for status and unassigned_shards keys.
Get index statistics

To print docs count by index (sum of fluentd daily indexes):
curl -sS localhost:9200/_cat/indices?v|awk '{print $3,$7}'|sed 's,^.*-\([0-9]*.[0-9]*.[0-9]*\),\1,g'|awk '{sum[$1]+=$2;}END{for (i in sum) print i,sum[i]}'|sort
To print index sizes in MB (sum of fluentd daily indexes):
curl -sS "localhost:9200/_cat/indices?v&bytes=b"|awk '{print $3,$9}'|sed 's,^.*-\([0-9]*.[0-9]*.[0-9]*\),\1,g'|awk '{sum[$1]+=$2;}END{for (i in sum) print i,sum[i]/1024/1024}'|sort
Cleanup unassigned shards

You can then list unassigned shards with:
curl -sS localhost:9200/_cat/shards | grep UNASSIGNED | awk '{print $1}'
To describe why shards cannot be allocated:
curl -sS localhost:9200/_cluster/allocation/explain?pretty
curl -sS http://localhost:9200/_cat/shards | grep UNASSIGNED | awk '{print $1}' | xargs -i curl -XDELETE "http://localhost:9200/{}"
Check ILM status

You can check ILM for single index or iterate over all indexes:
curl -sS http://localhost:9200/_cat/indices?h=i | while read i; do
    curl -sS http://localhost:9200/$i/_ilm/status?pretty
done
index.lifecycle.rollover_alias does not point to index

If you see error similar to
index.lifecycle.rollover_alias [fluentd.kube] does not point to index [fluentd.kube-2019.11.08-000001]

It means that alias does not point to index you are trying to roll over. This
state won't be fixed by itself and you need to fix alias first and then retry
ILM.
curl -sS -XPOST -H 'Content-Type: application/json' -d '{"actions": [{"add":{"index":"fluentd.kube-2019.11.08-000001","alias":"fluentd.kube"}}]}' localhost:9200/_aliases
curl -sS -XPOST localhost:9200/fluentd.kube-2019.11.08-000001/_ilm/retry?pretty
FORBIDDEN/12/index read-only / allow delete (api)

Index can become read-only (with disallowed delete operation) in certain
situations, eg. when elastic will go out of disk space, etc. Then it's
necessary to fix index configuration and allow deletion again
(read_only_allow_delete).
This behavior os further explained in Elasticsearch documentation:
cluster.routing.allocation.disk.watermark.flood_stage: Controls the flood
stage watermark. It defaults to 95%, meaning that Elasticsearch enforces a
read-only index block (index.blocks.read_only_allow_delete) on every index
that has one or more shards allocated on the node that has at least one disk
exceeding the flood stage. This is a last resort to prevent nodes from running
out of disk space.  The index block must be released manually once there is
enough disk space available to allow indexing operations to continue.
To fix it, check storage space and unset
index.blocks.read_only_allow_delete. This is possible in Kibana or with
curl:
# access the elastic container: 
kubectl exec -ti -n sre elasticsearch-0 bash
curl -XPUT -H 'Content-Type: application/json' -d '{"index.blocks.read_only_allow_delete": null}' localhost:9200/myindex/_settings

To do this for all indexes, use _all in place of index name:
curl -XPUT -H 'Content-Type: application/json' -d '{"index.blocks.read_only_allow_delete": null}' localhost:9200/_all/_settings

Trigger cronjob for ES currator manually

kubectl create job -n sre --from=cronjob/elasticsearch-curator elasticsearch-curator-$(date +%s)
kubectl get po -n sre | grep elastic   # review job is running and completes sucesfully