Last active
August 19, 2016 14:15
-
-
Save raphaelMalie/203e411fddb64d75444014d3f28e6f8e to your computer and use it in GitHub Desktop.
Use case where I need to sort by a scripted_metric aggregation in ElasticSearch
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"filtered": { | |
"filter": { | |
"geo_distance": { | |
"distance": 50, | |
"unit": "km", | |
"city.coordinates": { | |
"lat": 48.856614, | |
"lon": 2.3522219 | |
} | |
} | |
} | |
}, | |
"aggs": { | |
"groupByCity": { | |
"terms": { | |
"field": "city.id" | |
}, | |
"aggs": { | |
"getCityDistance": { | |
"scripted_metric": { | |
"init_script": "_agg['distance'] = []", | |
"map_script": "if (_agg['distance'].size() == 0) { _agg['distance'].add(doc['city.coordinates'].arcDistanceInKm(48.856614, 2.3522219)); }", | |
"reduce_script": "return _aggs[0]['distance'].value" | |
} | |
} | |
} | |
} | |
} | |
} |
@colings86 Thx for reply, you are right about params, I'm using them but I just simplified my query and skipped them in this example.
The problem with this query is that it will perform a lot of useless operations (correct me if I'm wrong):
since the classifieds are grouped per city, I only need to calculate "arcDistanceInKm" for the first classified in each bucket. If I have 1000 classifieds in the closest neighbor city, your script will calculate 1000 times "arcDistanceInKm" with the same coordinates and then do an average of the results, which will be useless since this city has always the same coordinates. With scripted_metric, I can avoid these calculations (that's the point of "if (_agg['distance'].size() == 0)" in my code).
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
instead of using the
scripted_metric
aggregation here you should be able to acheive what you want by using the script option on theavg
aggregation which would let you sort the way your want on the terms agg (though note the comments made here . This has the added advantage of the distance not being skewed if the first record thats collected has an erroneous value. Also its a good idea to use params with your scripts so ES doesn't have to recompile the script everytime you run it (scripts are cached, keyed on their contents).