Skip to content

Instantly share code, notes, and snippets.

@suensummit
Forked from kkc/elasticsearch.md
Last active August 29, 2015 14:21
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save suensummit/9987e090c659ac7a2974 to your computer and use it in GitHub Desktop.
Save suensummit/9987e090c659ac7a2974 to your computer and use it in GitHub Desktop.

##TUNING##

Configuration

System: set file descriptors to 32K or 64K

vim /etc/security/limit.conf

elasticsearch - nofile 65535
elasticsearch - memlock unlimited

use following command to check

curl localhost:9200/_nodes/process?pretty

"process" : {
     "refresh_interval_in_millis" : 1000,
     "id" : 2697,
     "max_file_descriptors" : 65535,
     "mlockall" : true
 }

To set this value permanently, update the vm.max_map_count setting in /etc/sysctl.conf

sysctl -w vm.max_map_count=262144
#If you installed Elasticsearch using a package (.deb, .rpm) this setting 
#will be changed automatically. To verify, run sysctl vm.max_map_count.

Disable swap

vm.swappiness to 0

Disk Performance

For SSDs in r3, maybe it's better to mount with discard option since it supports TRIM:

vim /etc/fstab/

/dev/xvdb /mnt ext4 defaults,noatime,nodiratime,discard 0 0

Use noop scheduler for SSD:

echo noop | sudo tee /sys/block/xvdc/queue/scheduler

ES Settings

vim /etc/default/elasticsearch

use half of machine memory for JVM or not excess 32g

ES_HEAP_SIZE=15g
MAX_OPEN_FILES=65535
MAX_LOCKED_MEMORY=unlimited

vim /etc/elasticsearch/elasticsearch.yaml

never swaping

bootstrap.mlockall: true

indexing performance

"indices.memory.index_buffer_size": "30%",    #10%
"index.translog.flush_threshold_ops": 50000,  #1000
"index.refresh_interval": "5s",               #1s
#"index.store.type": "mmapfs"

adjust thoughput from 20mb to 100mb

PUT /_cluster/settings
{
    "persistent" : {
        "indices.store.throttle.max_bytes_per_sec" : "100mb"
    }
}

Mapping

  1. elasticsearch 會儲存原始檔案在 _source 欄位, 如果不需要可以關閉

  2. elasticsearch 會把所有欄位的資料處理好放在 _all 欄位, 如果不需要也可以關閉

    { 
      '_id': 1
      'title': 'this is first blog', 
      'author': 'kakashi', 
      'content': 'test 123'
    }
    存到ES後會變成
    {
      '_id': 1,
      '_all': 'this, is, first, blog, kakashi, test, 123',
      'title': 'this, is, first, blog',
      'author': 'kakashi',
      'content': 'test, 123',
      '_source': {
          'title': 'this is first blog', 
          'author': 'kakashi', 
          'content': 'test 123'
      }
    
  3. 如果把 _source 關閉, 可以利用 _store 決定是否要儲存此field

    {
       "tweet" : {
         "properties" : {
             "message" : {
                 "type" : "string",
                 "store" : true,
                 "index" : "analyzed",
             },
    
  4. 使用 _source 和 _store 的最大差別, 用 _source 可以利用 update API 去更新值

  5. 在 analyze field 時, 如果不需要算出score (相關性), 可以把norms關閉, 會節省大量memory

  6. index_options 可以決定要不要存term frequencies 還有 positions

  7. 不需要index的欄位請使用no, 該欄位不需要切詞可以用not_analyzed

建立mapping的方式

  1. 利用template

    PUT _template/blog-template
    {  
      "template": "db*",  <--- index(db) name
      "mappings": { 
         "blog": {        <---- type (table) name
            "properties": {
              "author": {
                "type": "string",
                "index": "not_analyzed"
              },
              "content": {
                "type": "string"
             }
          }
       }
    }
    
  2. 取得mapping GET db/_mapping/

  3. 直接修改db的mapping PUT db/_mapping

Indexing

  1. 利用Bulk indexing的方式, 最好控制在1MB~5MB間
  2. 重要性較低的資料可以用bulk UDP indexing (可以忍受掉資料)
  3. reindexing時可以將refresh_interval設成-1, Bulk indexing時手動做refresh
  4. 可以利用index warmer增加搜索速度 (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-warmers.html)

Sharding & Replica

  1. 增加Sharding & 機器 -> 增加indexing能力
  2. 增加Replica & 機器 -> 增加Read能力

Reference####

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-configuration.html https://blog.codecentric.de/en/2014/05/elasticsearch-indexing-performance-cheatsheet/ http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment