Skip to content

Instantly share code, notes, and snippets.

@jprante
Last active March 19, 2019 11:22
Show Gist options
  • Save jprante/10666960 to your computer and use it in GitHub Desktop.
Save jprante/10666960 to your computer and use it in GitHub Desktop.
Elasticsearch configuration for high sustainable bulk feed

Elasticsearch configuration for high sustainable bulk feed

Test on single node, MacBook Pro, 16 GB RAM, 1TB SSD, OS X Maverick

ES 1.1.0 with Java 8, G1 GC, 12 GB heap

/Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/bin/java -Xms12g -Xmx12g -Djava.awt.headless=true -XX:+UseG1GC -Delasticsearch -Des.foreground=yes -Des.path.home=/Users/es/elasticsearch-1.1.0 -cp :/Users/es/elasticsearch-1.1.0/lib/elasticsearch-1.1.0.jar:/Users/es/elasticsearch-1.1.0/lib/:/Users/es/elasticsearch-1.1.0/lib/sigar/ org.elasticsearch.bootstrap.Elasticsearch

Node

  • no bloom filter cache

  • concurrent merge scheduler

  • max 4 threads for merge, also for optimize API

  • max 4 segments per tier

  • max 1gb segment size

  • 1/3 of heap for index buffer

  • for SSD, disable store throttling

  • adjust merge and bulk thread pools

      index:
         codec:
      	 bloom:
      	   load: false
         merge:
      	 scheduler:
      	   type: concurrent
      	   max_thread_count: 4
      	 policy:
      	   type: tiered
      	   max_merged_segment: 1gb
      	   segments_per_tier: 4
      	   max_merge_at_once: 4
      	   max_merge_at_once_explicit: 4
      indices:
         memory:
      	 index_buffer_size: 33%
         store:
      	 throttle:
      	   type: none
      threadpool:
        merge:
      	type: fixed
      	size: 4
      	queue_size: 32
        bulk:
      	type: fixed
      	size: 8
      	queue_size: 32
    

Index

  • 1 shard

  • 0 replica

  • no refresh interval (-1)

      index.number_of_shards: 1
      index.number_of_replica: 0
      index.refresh_interval: -1
    

Mapping

  • Mapping for string texts: all norms, freqs can be disabled because of the nature of the input data

      "mappings" : {
        "_default_" : {
      	"dynamic_templates" : [
      		{
      			"string_template" : {
      				  "match_mapping_type" : "string",
      				  "path_match" : "*",
      				  "mapping" : {
      					  "type" : "string",
      					  "norms" : { "enabled" : false },
      					  "index_options" : "docs"
      				  }
      			}
      		}
      	]
        }
      }
    

Bulk

  • Java API, single TransportClient instance
  • BulkProcessor
  • bulk size 3000 docs (~ 2 MB)
  • max 4 concurrent threads
  • no flush interval, no flush volume
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment