To load the data in to ES you need to start it with more heap, 4G should be enough: $ ES_HEAP_SIZE=4g ./bin/elasticsearch -f
To load the dataset for testing run: $ ./aggsbug.load.sh
To reproduce the bug run: $ ./aggsbug.test.sh
The test script calls the same aggregation twice, stores the output, and later prints the diff between the outputs.
If you see no diff, or only a diff on the took
field it is normal behavior.
If you see a list of diff's mainly on the doc_count
fields it is giving back different numbers in both calls .
It is advised to run the test multiple times, since sometimes the numbers are the same between the calls.
When you run this test on 1.0.0.Beta2 (the first version having the aggregations in it iirc) you will see it will always return the same values. Since version 1.0.0.RC1 the inconsistency start appearing.