Skip to content

Instantly share code, notes, and snippets.

@markbirbeck
Created January 26, 2012 19:20
Show Gist options
  • Save markbirbeck/1684506 to your computer and use it in GitHub Desktop.
Save markbirbeck/1684506 to your computer and use it in GitHub Desktop.
[ElasticSearch] Insert 6 records and query for a single record with facets: Failure when the number of records is greater than the shard size
# Create an index:
#
curl -XDELETE 'http://127.0.0.1:9200/articles'
curl -XPUT 'http://127.0.0.1:9200/articles'
# Insert the action mapping, as a child of articles:
#
curl -XPUT 'http://127.0.0.1:9200/articles/article/_mapping' -d '
{
"article": {
"properties": {
"title": {
"store": true,
"type": "string"
},
"actions": {
"type": "nested",
"properties": {
"action": {
"store": true,
"type": "string"
},
"updated": {
"store": true,
"type": "integer"
},
"date": {
"store": true,
"format": "dateOptionalTime",
"type": "date"
}
}
}
}
}
}
'
# Insert some articles:
#
for ((i=1; i<7; i++))
do
curl -XPUT 'http://127.0.0.1:9200/articles/article/'$i -d '
{
"title": "Article '$i'",
"actions": [
{"date": "2010-11-18", "updated": 1, "action": "updated"}
]
}
'
done
# Ensure the index is up-to-date:
#
curl -XPOST 'http://127.0.0.1:9200/articles/_refresh'
# Now search for some articles that have actions that took place
# on the 18th of November, 2010. Create facets for those actions:
#
curl -XGET 'http://127.0.0.1:9200/articles/_search?pretty=true' -d '
{
"query": {
"bool": {
"must": [
{ "query_string": {"query": "title:\"Article 1\""} },
{
"nested": {
"path": "actions",
"score_mode": "avg",
"_scope": "actions",
"query": {
"bool": {
"must": [
{
"range": {
"actions.date": {
"from": "2010-11-18",
"to": "2010-11-18"
}
}
}
]
}
}
}
}
]
}
},
"facets": {
"updated_facet": {
"date_histogram": {
"interval": "day",
"key_field": "actions.date",
"value_field": "actions.updated"
},
"scope": "actions"
}
},
"from": 0,
"size": 0
}'
@markbirbeck
Copy link
Author

A successful result would be for the updated facet to be set to 1, since there is only one record that matches the search criteria:

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 6,
    "successful" : 6,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.5652475,
    "hits" : [ ]
  },
  "facets" : {
    "updated_facet" : {
      "_type" : "date_histogram",
      "entries" : [ {
        "time" : 1290038400000,
        "count" : 1,
        "min" : 1.0,
        "max" : 1.0,
        "total" : 1.0,
        "total_count" : 1,
        "mean" : 1.0
      } ]
    }
  }
}

However, if the number of shards is 5, the query returns a count of 2 instead of 1. If the number of shards is 6 then the Gist above will give a correct result, but if we change it so that it inserts 7 records instead of 6, then we're back to having a count of 2 (note also that the number of hits remains at 1, since only one article matches):

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 6,
    "successful" : 6,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 2.0843315,
    "hits" : [ ]
  },
  "facets" : {
    "updated_facet" : {
      "_type" : "date_histogram",
      "entries" : [ {
        "time" : 1290038400000,
        "count" : 2,
        "min" : 1.0,
        "max" : 1.0,
        "total" : 2.0,
        "total_count" : 2,
        "mean" : 1.0
      } ]
    }
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment