Skip to content

Instantly share code, notes, and snippets.

@dqduc
Forked from jprante/gist:10666960
Created September 30, 2015 08:37
Show Gist options
  • Save dqduc/a3ca06695284ae60ebd8 to your computer and use it in GitHub Desktop.
Save dqduc/a3ca06695284ae60ebd8 to your computer and use it in GitHub Desktop.
Elasticsearch configuration for high sustainable bulk feed

Elasticsearch configuration for high sustainable bulk feed

Test on single node, MacBook Pro, 16 GB RAM, 1TB SSD, OS X Maverick

ES 1.1.0 with Java 8, G1 GC, 12 GB heap

/Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/bin/java -Xms12g -Xmx12g -Djava.awt.headless=true -XX:+UseG1GC -Delasticsearch -Des.foreground=yes -Des.path.home=/Users/es/elasticsearch-1.1.0 -cp :/Users/es/elasticsearch-1.1.0/lib/elasticsearch-1.1.0.jar:/Users/es/elasticsearch-1.1.0/lib/:/Users/es/elasticsearch-1.1.0/lib/sigar/ org.elasticsearch.bootstrap.Elasticsearch

Node

  • no bloom filter cache

  • concurrent merge scheduler

  • max 4 threads for merge, also for optimize API

  • max 4 segments per tier

  • max 1gb segment size

  • 1/3 of heap for index buffer

  • for SSD, disable store throttling

  • adjust merge and bulk thread pools

      index:
         codec:
      	 bloom:
      	   load: false
         merge:
      	 scheduler:
      	   type: concurrent
      	   max_thread_count: 4
      	 policy:
      	   type: tiered
      	   max_merged_segment: 1gb
      	   segments_per_tier: 4
      	   max_merge_at_once: 4
      	   max_merge_at_once_explicit: 4
      indices:
         memory:
      	 index_buffer_size: 33%
         store:
      	 throttle:
      	   type: none
      threadpool:
        merge:
      	type: fixed
      	size: 4
      	queue_size: 32
        bulk:
      	type: fixed
      	size: 8
      	queue_size: 32
    

Index

  • 1 shard

  • 0 replica

  • no refresh interval (-1)

      index.number_of_shards: 1
      index.number_of_replica: 0
      index.refresh_interval: -1
    

Mapping

  • Mapping for string texts: all norms, freqs can be disabled because of the nature of the input data

      "mappings" : {
        "_default_" : {
      	"dynamic_templates" : [
      		{
      			"string_template" : {
      				  "match_mapping_type" : "string",
      				  "path_match" : "*",
      				  "mapping" : {
      					  "type" : "string",
      					  "norms" : { "enabled" : false },
      					  "index_options" : "docs"
      				  }
      			}
      		}
      	]
        }
      }
    

Bulk

  • Java API, single TransportClient instance
  • BulkProcessor
  • bulk size 3000 docs (~ 2 MB)
  • max 4 concurrent threads
  • no flush interval, no flush volume
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment