Skip to content

Instantly share code, notes, and snippets.

@ncolomer
Last active August 29, 2015 14:13
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ncolomer/99c2ac85d5a04d7d1fdf to your computer and use it in GitHub Desktop.
Save ncolomer/99c2ac85d5a04d7d1fdf to your computer and use it in GitHub Desktop.
Elasticsearch gauss decay function issue

Edit: there is no issue!

The score number was not correctly parsed by the bash script. Indeed, a double number can be returned result as its scientific notation such as 4.06E-05 in JSON. This happens when normal notation is not precise enough to display all significant numbers (eg.0.0000406).

The following script correctly parses elasticsearch results. In addition, I added the raw output in the results.ok.tsv file.

ES_HOST=localhost

curl -XDELETE "http://$ES_HOST:9200/test"
curl -XPUT "http://$ES_HOST:9200/test" -d '{"settings":{"number_of_shards":1,"number_of_replicas":0},"mappings":{"test":{"properties":{"date":{"type":"date"}}}}}'

for scale in {3,6,9}; do
  for offset in {0,3,6}; do
    echo "scale=$scale, offset=$offset"
    for day in {0..60}; do
      curl -s -XPOST "http://$ES_HOST:9200/test/test/1?refresh=true" -d '{"date":"'"$(date -v-${day}d '+%Y-%m-%dT%H:%M:%S.000Z')"'"}' > /dev/null
      RESULT=$(curl -s -XGET "http://$ES_HOST:9200/test/_search" -d '{"_source":false,"query":{"function_score":{"query":{"match_all":{}},"functions":[{"gauss":{"date":{"origin":"now","scale":"'"$scale"'d","offset":"'"$offset"'d","decay":0.5}}}],"boost_mode":"replace"}}}')
      echo "$RESULT" | sed 's|.*"_score":\([^,\}]\+\).*|\1|' # <-- problem was here!
    done
  done
done

How to reproduce

Procedure

  • We (re)create a test index and put a simple mapping to declare the date field
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "test": {
      "properties": {
        "date": {"type": "date"}
      }
    }
  }
}
  • For various scale and offset values, we insert n docs with date equals to now minus n days. Each doc has the same id so that it replace the previous one (there is always one doc in the index)
{
  "date": $date
}
  • For each doc, we run a function score query with a gauss decay function relative to now
{
  "_source": false,
  "query": {
    "function_score": {
      "query": {"match_all": {}},
      "functions": [{
        "gauss": {
          "date": {
            "origin": "now",
            "scale": $scale"d",
            "offset": $offset"d",
            "decay": 0.5
          }
        }
      }],
      "boost_mode": "replace"
    }
  }
}

RunMe Shell script

Beware of the , instead of . as decimal separator

ES_HOST=localhost

curl -XDELETE "http://$ES_HOST:9200/test"
curl -XPUT "http://$ES_HOST:9200/test" -d '{"settings":{"number_of_shards":1,"number_of_replicas":0},"mappings":{"test":{"properties":{"date":{"type":"date"}}}}}'

for scale in {3,6,9}; do
  for offset in {0,3,6}; do
    echo "scale=$scale, offset=$offset"
    for day in {0..60}; do
      curl -s -XPOST "http://$ES_HOST:9200/test/test/1?refresh=true" -d '{"date":"'"$(date -v-${day}d '+%Y-%m-%dT%H:%M:%S.000Z')"'"}' > /dev/null
      RESULT=$(curl -s -XGET "http://$ES_HOST:9200/test/_search" -d '{"_source":false,"query":{"function_score":{"query":{"match_all":{}},"functions":[{"gauss":{"date":{"origin":"now","scale":"'"$scale"'d","offset":"'"$offset"'d","decay":0.5}}}],"boost_mode":"replace"}}}')
      echo "$RESULT" | sed 's|.*"_score":\([0-9]*\).\([0-9]*\).*|\1,\2|' # Change sep here
    done
  done
done

You can find sample graphed results in this Gist files results-20-days.png and results-60-days.png and raw results in the results.tsv file (you can copypaste/import into Excel)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment