Skip to content

Instantly share code, notes, and snippets.

@danfairs
Forked from jprante/gist:10666960
Created May 18, 2014 07:51
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save danfairs/e4a2ca06d1f75e646cd3 to your computer and use it in GitHub Desktop.
Save danfairs/e4a2ca06d1f75e646cd3 to your computer and use it in GitHub Desktop.

Elasticsearch configuration for high sustainable bulk feed

Test on single node, MacBook Pro, 16 GB RAM, 1TB SSD, OS X Maverick

ES 1.1.0 with Java 8, G1 GC, 12 GB heap

/Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/bin/java -Xms12g -Xmx12g -Djava.awt.headless=true -XX:+UseG1GC -Delasticsearch -Des.foreground=yes -Des.path.home=/Users/es/elasticsearch-1.1.0 -cp :/Users/es/elasticsearch-1.1.0/lib/elasticsearch-1.1.0.jar:/Users/es/elasticsearch-1.1.0/lib/:/Users/es/elasticsearch-1.1.0/lib/sigar/ org.elasticsearch.bootstrap.Elasticsearch

Node

  • no bloom filter cache

  • concurrent merge scheduler

  • max 4 threads for merge, also for optimize API

  • max 4 segments per tier

  • max 1gb segment size

  • 1/3 of heap for index buffer

  • for SSD, disable store throttling

  • adjust merge and bulk thread pools

      index:
         codec:
      	 bloom:
      	   load: false
         merge:
      	 scheduler:
      	   type: concurrent
      	   max_thread_count: 4
      	 policy:
      	   type: tiered
      	   max_merged_segment: 1gb
      	   segments_per_tier: 4
      	   max_merge_at_once: 4
      	   max_merge_at_once_explicit: 4
      indices:
         memory:
      	 index_buffer_size: 33%
         store:
      	 throttle:
      	   type: none
      threadpool:
        merge:
      	type: fixed
      	size: 4
      	queue_size: 32
        bulk:
      	type: fixed
      	size: 8
      	queue_size: 32
    

Index

  • 1 shard

  • 0 replica

  • no refresh interval (-1)

      index.number_of_shards: 1
      index.number_of_replica: 0
      index.refresh_interval: -1
    

Mapping

  • Mapping for string texts: all norms, freqs can be disabled because of the nature of the input data

      "mappings" : {
        "_default_" : {
      	"dynamic_templates" : [
      		{
      			"string_template" : {
      				  "match_mapping_type" : "string",
      				  "path_match" : "*",
      				  "mapping" : {
      					  "type" : "string",
      					  "norms" : { "enabled" : false },
      					  "index_options" : "docs"
      				  }
      			}
      		}
      	]
        }
      }
    

Bulk

  • Java API, single TransportClient instance
  • BulkProcessor
  • bulk size 3000 docs (~ 2 MB)
  • max 4 concurrent threads
  • no flush interval, no flush volume
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment