Skip to content

Instantly share code, notes, and snippets.

@ijokarumawak
Created April 8, 2021 04:14
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ijokarumawak/c2e1982bce1cd197cd626f5d834352bf to your computer and use it in GitHub Desktop.
Save ijokarumawak/c2e1982bce1cd197cd626f5d834352bf to your computer and use it in GitHub Desktop.
How Elasticsearch calculate average if there are multiple shards
# Create an index with 2 primary shards.
PUT avg-avg
{
"settings": {
"number_of_shards": 2,
"number_of_replicas": 0
}
}
# Add some documents.
POST avg-avg/_bulk
{"index": {"_id": "1"}}
{"v": 100}
{"index": {"_id": "2"}}
{"v": 100}
{"index": {"_id": "3"}}
{"v": 100}
{"index": {"_id": "a"}}
{"v": 100}
{"index": {"_id": "b"}}
{"v": 1}
{"index": {"_id": "c"}}
{"v": 100}
# With explain, we can find which shard a document resides.
# Doc 'b' resides in shard 1. The rest is in shard 0
GET avg-avg/_search
{
"explain": true
}
# If ES calculates overall avg, it should be:
# (500 + 1) / 6 = 83.5
# But if shard level avg, then avg of avg, it would be:
# (100 + 1) / 2 = 50.5
GET avg-avg/_search
{
"size": 0,
"aggs": {
"avg": {
"avg": {
"field": "v"
}
}
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment