Skip to content

Instantly share code, notes, and snippets.

@dbu
Last active November 5, 2018 17:02
Show Gist options
  • Save dbu/766c83ad0c977129d4ff21ad5a982aeb to your computer and use it in GitHub Desktop.
Save dbu/766c83ad0c977129d4ff21ad5a982aeb to your computer and use it in GitHub Desktop.
Elasticsearch: Find all documents that have invalid entries in an array

We had a problem where invalid data got stored in elasticsearch. An array of objects had some objects placed in it that are missing a mandatory field. After fixing the mistake, we wanted to update all offending entires. For this, we need to get the IDs of affected items.

The "obvious" query would be _exists_:general_information AND !(_exists_:general_information.value). But as soon as there is any array element with a value, the second condition will consider the value existing. If there are any valid entries in the array, the query will not work as expected.

The solution we found was to use an ES script that loops over the elements in the source document and returns 1 if it finds one that has no data. To our positive surprise, running this on an index with over 1M entries only took a couple of seconds. Definitely not something for a routine query, but an acceptable time for a one-off query to fix a problem.

{
"id": "ABCD",
"general_information": [
{
"key": "foo",
"value": "bar"
},
{
"key": "invalid"
}
]
}
{
"stored_fields": ["id"],
"query": {
"bool": {
"must": {
"script": {
"script": {
"inline": "if (!params._source.containsKey('general_information')) {return 0;} for(def info : params._source.general_information) { if (!info.containsKey('value')) {return 1;} }",
"lang": "painless"
}
}
}
}
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment