Skip to content

Instantly share code, notes, and snippets.

@ashishtiwari1993
Last active March 26, 2021 01:22
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ashishtiwari1993/004a19f4a44efc214403a7fc1ee27cda to your computer and use it in GitHub Desktop.
Save ashishtiwari1993/004a19f4a44efc214403a7fc1ee27cda to your computer and use it in GitHub Desktop.
Below are some challenges & exceptions faced while setting up Elasticsearch. I just shared my experience and learning. Please correct me, If you guys feel somewhere i got wrong OR You can contribute if you have any experiences . Will keep update this gist.

(by @_ashish_tiwari)


Elasticsearch specification:

Version : 6.2
Heap size : 30 GB
core : 24
Memory : 128 GB
Client : PHP - 6.0

Exception #1 :

Fielddata is disabled on text fields by default.Set fielddata=true on [myfieldName] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.

Solution :

  1. I noticed that i was sorting on 'text' field. Just changed field type from 'text' to 'keyword' and reindex the all data. Now sort is working. Field type should Numeric, date, keyword type only on which we need to perform sort.
  2. Even you cannot perform aggregation on text field.

Exception #2 :

rejected execution of org.elasticsearch.transport.TransportService$7@45468c89 on EsThreadPoolExecutor[name = localhost:9200/bulk, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@452b58db[Running, pool size = 24, active threads = 24, queued tasks = 200, completed tasks = 32109349]]

Solution :

  1. My data was losses because of above Exception. I was monitoring number of rejection by hitting
    curl -X GET http://localhost:9200/_cat/thread_pool

  2. Elsticsearch bulk pool size got full and start rejecting all incoming data. I increased my thread_pool.bulk.queue_size and thread_pool.index.queue_size with 500 which stopped rejection. It is not standard value, You need to find out what should be perfect value for your application.

  3. Also set thread_pool.index.size and thread_pool.bulk.size to 24. 24 is number of core cpus you have. It will make sure all cpu should be in use.

  4. If it is again start rejection after increase bulk queue size with serval limit, Check with your bulk request frequency and make bulk request with some interval.

Exception #3 :

Not not_x_content_exception

Solution :

Check with your content whether it is already json encoded or not. It should be in proper json format.

Exception #4 :

version_conflict_engine_exception , version conflict, current version [versionid] is different than the one provided [versionid]

Solution :

  1. use retry_on_conflict : 5 parameter, It reattempt to reindex/update your doc for 5 times. 5 is not standard value. You need to evaluate for your applicaton.
  2. If you care about data loss, Then you need to again reindex/update your data from your primary DB OR store your data locally and retry again after some time.
  3. If data loss is okay with you, Then you can avoid retry option. As elasticsearch maintains version of doc while updating, So only your last version data will loss not whole updation.

Exception #5 :

{"type":"cluster_block_exception","reason":"blocked by: [FORBIDDEN\/12\/index read-only \/ allow delete (api)];"}

Solution :

  1. Your disk space going to full OR its reached to threshold, Which you specify in elasticsearcy.yml file. There is some default value for disk usage. You can check here.
  2. Once your threshold reached, All indexes will have only read/Delete permission. You can revert this permission with below api:
curl -X PUT "localhost:9200/_all/_settings" -H 'Content-Type: application/json' -d'  
{  
   "index": {  
     "blocks": {  
       "read_only_allow_delete": "false"  
      }  
    }  
}'  
  1. You can edit threshold value in elasticsearch.yml file:
cluster.routing.allocation.disk.threshold_enabled: true  
cluster.routing.allocation.disk.watermark.low: 10gb  
cluster.routing.allocation.disk.watermark.high: 10gb  
cluster.routing.allocation.disk.watermark.flood_stage: 10gb  

Challenge #1 :

My application having heavy update process, For which i was using bulk request with heavy update queries. It also contains lots of script conditions . Flow is first doc is insert to ES and then updates multiple time. Insertion was too fast but update was opposite,which is too slow. Below are some solutions which helped me to increase the update performance.

Solution :

  1. Use 'upsert' if possible, where you can update doc if exists or insert as new doc. I converted all my update with upsert which was perfect with my application.
  2. Don't update one doc multiple time too frequently. Update is nothing but delete and reindex.
  3. Avoid heavy use of script while updating doc . If it is necessary, then you can store your script in cache of elasticsearch cluster and use it by just passing parameters. Check here For more info .
  4. Set index.refresh_interval to 30s . By changing refresh_interval document will available for search after 30s. Refreshing is an expensive operation and it's default value is 1s . So its better to refresh on specific interval. For more info on refresh_interval check here .
  5. If your performace limited to resource, You can add another data node.
  6. Most important: I had 1 replica per index. At the time of peak hour, Update and insert was too slow. I just simply remove my replica and set it to 0. You can set no. of replicat as shown here. My update process boost up with 15x. Now there is no more delay.

Challenge #2 :

Document not getting upserted properly with PHP-Elasticsearch SDK.

Solution :

I specified empty object with empty array like this content => array(). But this is not valid in Elasticsearch DSL. You can use content => \stdClass() to specify empty object.

Challenge #3 :

Need to export all data in CSV.

Solution :

Written library which will use PHP-SDK of elasticsearch and it fetch data using Scroll API to csv. https://github.com/ashishtiwari1993/elasticsearch-csv-export

Question #1 :

What should be the value of max_gram and min_gram in Elasticsearch ?

Answer :

I shared my learnings in this blog. What should be the value of max_gram and min_gram in Elasticsearch ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment