jprante/gist:10666960

## gistfile1.md

      
    Raw
  

              gistfile1.md
            
          
    Elasticsearch configuration for high sustainable bulk feed

Test on single node, MacBook Pro, 16 GB RAM, 1TB SSD, OS X Maverick
ES 1.1.0 with Java 8, G1 GC, 12 GB heap
/Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/bin/java -Xms12g -Xmx12g -Djava.awt.headless=true -XX:+UseG1GC -Delasticsearch -Des.foreground=yes -Des.path.home=/Users/es/elasticsearch-1.1.0 -cp :/Users/es/elasticsearch-1.1.0/lib/elasticsearch-1.1.0.jar:/Users/es/elasticsearch-1.1.0/lib/:/Users/es/elasticsearch-1.1.0/lib/sigar/ org.elasticsearch.bootstrap.Elasticsearch
Node


no bloom filter cache


concurrent merge scheduler


max 4 threads for merge, also for optimize API


max 4 segments per tier


max 1gb segment size


1/3 of heap for index buffer


for SSD, disable store throttling


adjust merge and bulk thread pools
  index:
     codec:
  	 bloom:
  	   load: false
     merge:
  	 scheduler:
  	   type: concurrent
  	   max_thread_count: 4
  	 policy:
  	   type: tiered
  	   max_merged_segment: 1gb
  	   segments_per_tier: 4
  	   max_merge_at_once: 4
  	   max_merge_at_once_explicit: 4
  indices:
     memory:
  	 index_buffer_size: 33%
     store:
  	 throttle:
  	   type: none
  threadpool:
    merge:
  	type: fixed
  	size: 4
  	queue_size: 32
    bulk:
  	type: fixed
  	size: 8
  	queue_size: 32


Index


1 shard


0 replica


no refresh interval (-1)
  index.number_of_shards: 1
  index.number_of_replica: 0
  index.refresh_interval: -1


Mapping


Mapping for string texts: all norms, freqs can be disabled because of the nature of the input data
  "mappings" : {
    "_default_" : {
  	"dynamic_templates" : [
  		{
  			"string_template" : {
  				  "match_mapping_type" : "string",
  				  "path_match" : "*",
  				  "mapping" : {
  					  "type" : "string",
  					  "norms" : { "enabled" : false },
  					  "index_options" : "docs"
  				  }
  			}
  		}
  	]
    }
  }


Bulk


Java API, single TransportClient instance
BulkProcessor
bulk size 3000 docs (~ 2 MB)
max 4 concurrent threads
no flush interval, no flush volume