Ran on my macbook air, half a million docs. Single node, 5 primary 0 replica. Node restarted between runs to make sure all caches cleared, etc.
$ python loadtester.py --es "http://localhost:9200/speedtest/_search" -i ../data/stoicism.txt -o test1.txt --ns 10000 --nt 3 --nf 10
0 26004 1.36110687256
1000 5561 0.0182199478149
2000 10516 0.0134048461914
3000 42137 0.0833399295807
4000 34922 0.0168430805206
5000 5408 0.00911998748779
6000 45315 0.0210130214691
7000 42732 0.0193800926208
8000 5393 0.0104150772095
9000 8031 0.015035867691
$ python analyser.py test1.txt
180868227 results in 10000 searches (mean 18086)
0.02s mean query time, 1.36s max, 0.01s min
50% of qtimes <= 0.01s
90% of qtimes <= 0.02s
99% of qtimes <= 0.05s
99.9% of qtimes <= 0.16s
- Replaces and/or/not with bool. Equivalent query, but bool is optimized to handle bitset-based filters (such as range/term)
- Replaces
numeric_range
withrange
. Numeric_range is deprecated in 0.90.8 first off (replaced with afielddata
mode in the range filter). Secondly, it operates on fielddata instead of lucene-based range filtering, so it's comparing apples to oranges in this benchmark. Also...I find it tends to be slower - Replaces multiple should clauses in the query with a single match + multiple terms. A single
match
with multiple terms translates into multiple Lucene terms OR'd together. You don't need an extra Bool to wrap.
$ python loadtester.py --es "http://localhost:9200/speedtest/_search" -i ../data/stoicism.txt -o test2.txt --ns 10000 --nt 3 --nf 10
0 17234 0.469129085541
1000 10241 0.0103521347046
2000 18599 0.0117888450623
3000 9496 0.00943398475647
4000 7503 0.00943303108215
5000 47209 0.0126769542694
6000 50272 0.0118138790131
7000 6506 0.0116741657257
8000 43656 0.0117161273956
9000 44132 0.012815952301
$ python analyser.py test2.txt
196173610 results in 10000 searches (mean 19617)
0.01s mean query time, 0.47s max, 0.01s min
50% of qtimes <= 0.01s
90% of qtimes <= 0.01s
99% of qtimes <= 0.02s
99.9% of qtimes <= 0.05s