Skip to content

Instantly share code, notes, and snippets.

@ofavre
Created May 30, 2011 09:15
Show Gist options
  • Save ofavre/998638 to your computer and use it in GitHub Desktop.
Save ofavre/998638 to your computer and use it in GitHub Desktop.
Testing performance for constant score queries in ElasticSearch 0.16.1

Purpose

I need to perform special ranking based on some criteria.
Before playing with my precise need, I’ve tested different possibilities, namely:

  • Using built-in queries
  • Using custom script in mvel (the fastest scripting facility according to ES documentation)
  • Using a native script plugin written in Java

I present here the results of the comparative performance test

Configuration

The hardware configuration

4 cores with HyperThreading (seen as 8)

cat /proc/cpuinfo
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel® Xeon® CPU E5520 @ 2.27GHz
stepping : 5
cpu MHz : 2260.734
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 3
cpu cores : 4
apicid : 7
initial apicid : 7
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida tpr_shadow vnmi flexpriority ept vpid
bogomips : 4521.92
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

6 GB of RAM

The software configuration

ElasticSearch: v0.16.1
Index size: 53430000 docs, 7.6 GB on disk
Queries run with pretty printing (does not add much)
Mapping:

curl -XGET ‘localhost:9200/_mapping?pretty=on’
{
“md5hash_spl” : {
“md5” : {
“properties” : {
“md5” : {
“omit_term_freq_and_positions” : true,
“include_in_all” : false,
“omit_norms” : true,
“store” : “yes”,
“analyzer” : “md5_hashsplitter”,
“type” : “string”
}
},
“_all” : {
“enabled” : false
}
}
}
}

The configuration:

# Cluster Settings
cluster:
name: es-test-hashsplitting

network:
publish_host: localhost

  1. Gateway Settings
    gateway:
    recover_after_nodes: 1
    recover_after_time: 5s
    expected_nodes: 1

index:
number_of_shards: 1
number_of_replicas: 0

analysis: analyzer: md5_hashsplitter: type: custom tokenizer: md5_hashsplitter_tokenizer tokenizer: md5_hashsplitter_tokenizer: type: hash_splitter chunk_length: 4 prefixes: ABCDEFGH

You can disregard special analyzer as there are not used for querying.

Each document constists of a single field having 8 values composed of 5 characters combining upto 8*16^4 different possibilities, having a uniform repartition.
The central limit theorem tells us that this leads to a normal law for term collisions, which is verified in practice.
But again, this should not alter the tests.

Tests

Reference:
{"query":{"match_all":{}}}
took 1775 ms

Native query:
{"query":{"constant_score":{"filter":{"match_all":{}},"boost":1000}}}
took 1780 ms
Performance drop: less than 0.2% worse

Simple Mvel script:
{"query":{"custom_score":{"query":{"match_all":{}},"lang":"mvel","script":1000}}}
took 110253 ms
Performance drop: almost 61 times worse

Native Java plugin script simply returning a constant:
{"query":{"custom_score":{"query":{"match_all":{}},"params":{"constant":1000},"lang":"native","script":"constant"}}}
took 3230 ms
Performance drop: Near 80% worse

Information after these tests

curl -XGET 'localhost:9200/_cluster/nodes/stats?pretty=on'
{
  "cluster_name" : "es-test-hashsplitting",
  "nodes" : {
    "c_E1SbYpQ6iJIIsZJPBGZQ" : {
      "name" : "Tempus",
      "indices" : {
        "size" : "7.5gb",
        "size_in_bytes" : 8095676001,
        "docs" : {
          "num_docs" : 53430000
        },
        "cache" : {
          "field_evictions" : 0,
          "field_size" : "0b",
          "field_size_in_bytes" : 0,
          "filter_count" : 17,
          "filter_evictions" : 0,
          "filter_mem_evictions" : 0,
          "filter_size" : "6.3mb",
          "filter_size_in_bytes" : 6679216
        },
        "merges" : {
          "current" : 0,
          "total" : 0,
          "total_time" : "0s",
          "total_time_in_millis" : 0
        }
      },
      "os" : {
        "timestamp" : 1306745999897,
        "uptime" : "1 hour, 24 minutes and 40 seconds",
        "uptime_in_millis" : 5080000,
        "load_average" : [ 0.22, 0.22, 0.3 ],
        "cpu" : {
          "sys" : 0,
          "user" : 0,
          "idle" : 98
        },
        "mem" : {
          "free" : "3.6gb",
          "free_in_bytes" : 3881447424,
          "used" : "2.2gb",
          "used_in_bytes" : 2384564224,
          "free_percent" : 72,
          "used_percent" : 27,
          "actual_free" : "4.2gb",
          "actual_free_in_bytes" : 4516585472,
          "actual_used" : "1.6gb",
          "actual_used_in_bytes" : 1749426176
        },
        "swap" : {
          "used" : "0b",
          "used_in_bytes" : 0,
          "free" : "3.8gb",
          "free_in_bytes" : 4094681088
        }
      },
      "process" : {
        "timestamp" : 1306745999898,
        "cpu" : {
          "percent" : 0,
          "sys" : "3 seconds and 800 milliseconds",
          "sys_in_millis" : 3800,
          "user" : "6 minutes, 29 seconds and 540 milliseconds",
          "user_in_millis" : 389540,
          "total" : "-1 milliseconds",
          "total_in_millis" : -1
        },
        "mem" : {
          "resident" : "222.6mb",
          "resident_in_bytes" : 233443328,
          "share" : "10.7mb",
          "share_in_bytes" : 11251712,
          "total_virtual" : "1.4gb",
          "total_virtual_in_bytes" : 1510789120
        },
        "fd" : {
          "total" : 229
        }
      },
      "jvm" : {
        "timestamp" : 1306745999899,
        "uptime" : "26 minutes, 4 seconds and 228 milliseconds",
        "uptime_in_millis" : 1564228,
        "mem" : {
          "heap_used" : "102.8mb",
          "heap_used_in_bytes" : 107807944,
          "heap_committed" : "265.5mb",
          "heap_committed_in_bytes" : 278462464,
          "non_heap_used" : "33.4mb",
          "non_heap_used_in_bytes" : 35027136,
          "non_heap_committed" : "37mb",
          "non_heap_committed_in_bytes" : 38797312
        },
        "threads" : {
          "count" : 40,
          "peak_count" : 41
        },
        "gc" : {
          "collection_count" : 11536,
          "collection_time" : "19 seconds and 508 milliseconds",
          "collection_time_in_millis" : 19508,
          "collectors" : {
            "ParNew" : {
              "collection_count" : 11534,
              "collection_time" : "19 seconds and 508 milliseconds",
              "collection_time_in_millis" : 19508
            },
            "ConcurrentMarkSweep" : {
              "collection_count" : 2,
              "collection_time" : "0 milliseconds",
              "collection_time_in_millis" : 0
            }
          }
        }
      },
      "network" : {
        "tcp" : {
          "active_opens" : 606,
          "passive_opens" : 100,
          "curr_estab" : 39,
          "in_segs" : 19219,
          "out_segs" : 16906,
          "retrans_segs" : 0,
          "estab_resets" : 41,
          "attempt_fails" : 4,
          "in_errs" : 0,
          "out_rsts" : 84
        }
      },
      "transport" : {
        "rx_count" : 0,
        "rx_size" : "0b",
        "rx_size_in_bytes" : 0,
        "tx_count" : 0,
        "tx_size" : "0b",
        "tx_size_in_bytes" : 0
      }
    }
  }
}
curl -XGET 'localhost:9200/_cluster/nodes?pretty=on'
{
  "cluster_name" : "es-test-hashsplitting",
  "nodes" : {
    "c_E1SbYpQ6iJIIsZJPBGZQ" : {
      "name" : "Tempus",
      "transport_address" : "inet[localhost/127.0.0.1:9300]",
      "attributes" : {
      },
      "http_address" : "inet[localhost/127.0.0.1:9200]",
      "os" : {
        "refresh_interval" : 5000,
        "cpu" : {
          "vendor" : "Intel",
          "model" : "Xeon",
          "mhz" : 2260,
          "total_cores" : 8,
          "total_sockets" : 8,
          "cores_per_socket" : 16,
          "cache_size" : "8kb",
          "cache_size_in_bytes" : 8192
        },
        "mem" : {
          "total" : "5.8gb",
          "total_in_bytes" : 6266011648
        },
        "swap" : {
          "total" : "3.8gb",
          "total_in_bytes" : 4094681088
        }
      },
      "process" : {
        "refresh_interval" : 5000,
        "id" : 4785
      },
      "jvm" : {
        "pid" : 4785,
        "version" : "1.6.0_24",
        "vm_name" : "Java HotSpot(TM) 64-Bit Server VM",
        "vm_version" : "19.1-b02",
        "vm_vendor" : "Sun Microsystems Inc.",
        "start_time" : 1306744435671,
        "mem" : {
          "heap_init" : "256mb",
          "heap_init_in_bytes" : 268435456,
          "heap_max" : "1011.2mb",
          "heap_max_in_bytes" : 1060372480,
          "non_heap_init" : "23.1mb",
          "non_heap_init_in_bytes" : 24313856,
          "non_heap_max" : "130mb",
          "non_heap_max_in_bytes" : 136314880
        }
      },
      "network" : {
        "refresh_interval" : 5000,
        "primary_interface" : {
          "address" : "192.168.1.142",
          "name" : "eth0",
          "mac_address" : "00:25:64:BE:0E:59"
        }
      },
      "transport" : {
        "bound_address" : "inet[/0:0:0:0:0:0:0:0:9300]",
        "publish_address" : "inet[localhost/127.0.0.1:9300]"
      }
    }
  }
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment