Skip to content

Instantly share code, notes, and snippets.

@raphaelMalie
Last active August 19, 2016 14:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save raphaelMalie/203e411fddb64d75444014d3f28e6f8e to your computer and use it in GitHub Desktop.
Save raphaelMalie/203e411fddb64d75444014d3f28e6f8e to your computer and use it in GitHub Desktop.
Use case where I need to sort by a scripted_metric aggregation in ElasticSearch
{
"filtered": {
"filter": {
"geo_distance": {
"distance": 50,
"unit": "km",
"city.coordinates": {
"lat": 48.856614,
"lon": 2.3522219
}
}
}
},
"aggs": {
"groupByCity": {
"terms": {
"field": "city.id"
},
"aggs": {
"getCityDistance": {
"scripted_metric": {
"init_script": "_agg['distance'] = []",
"map_script": "if (_agg['distance'].size() == 0) { _agg['distance'].add(doc['city.coordinates'].arcDistanceInKm(48.856614, 2.3522219)); }",
"reduce_script": "return _aggs[0]['distance'].value"
}
}
}
}
}
}
@colings86
Copy link

colings86 commented Aug 19, 2016

instead of using the scripted_metric aggregation here you should be able to acheive what you want by using the script option on the avg aggregation which would let you sort the way your want on the terms agg (though note the comments made here . This has the added advantage of the distance not being skewed if the first record thats collected has an erroneous value. Also its a good idea to use params with your scripts so ES doesn't have to recompile the script everytime you run it (scripts are cached, keyed on their contents).

"average_city_distance": {
          "avg": {
            "script": {
              "inline": "doc['point'].arcDistanceInKm(ref_lat, ref_lon)",
              "params": {
                "ref_lat": 51.512923,
                "ref_lon": -0.132785
              }
            }
          }
        }

@raphaelMalie
Copy link
Author

raphaelMalie commented Aug 19, 2016

@colings86 Thx for reply, you are right about params, I'm using them but I just simplified my query and skipped them in this example.

The problem with this query is that it will perform a lot of useless operations (correct me if I'm wrong):
since the classifieds are grouped per city, I only need to calculate "arcDistanceInKm" for the first classified in each bucket. If I have 1000 classifieds in the closest neighbor city, your script will calculate 1000 times "arcDistanceInKm" with the same coordinates and then do an average of the results, which will be useless since this city has always the same coordinates. With scripted_metric, I can avoid these calculations (that's the point of "if (_agg['distance'].size() == 0)" in my code).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment