Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save xethorn/290e31176f493814823a20f281e82fd4 to your computer and use it in GitHub Desktop.
Save xethorn/290e31176f493814823a20f281e82fd4 to your computer and use it in GitHub Desktop.
Elasticsearch: min_children and max_children with nested datatypes for Elastic 6 and 7+.

Elasticsearch: Min and Max Children Query with Nested Datatypes

Elasticsearch does not support min_children and max_children for nested datatypes by default (see: issue #10043). In case you need this behavior, this gist provides an alternative solution. Please note that I haven't benchmarked it against join queries.

Alternative solution

To support min_children and max_children for your nested query, all you have to do is to use a function_score query. To make this more concrete: you have an index called Person with the following mapping:

  • first_name (text)
  • email (text)
  • children (nested)
    • first_name (text)
    • last_name (text)

Cases

To effectively support min_children and max_children, there are multiple queries you need to consider:

  • Find all persons who don't have children.
  • Find all persons who have more than n children:
    • requires: n > 0
    • if n is 0, it means you're asking for any Person, regardless if they have or don't have children.
  • Find all persons who have between n and m children:
    • requires: n > 0
    • if n is 0, it means you're asking to find all Person who have less than m children.
  • Find all persons who have less than n children.

Depending on the scenario, the request will look different.

Notes about function_score:

  • function_score supports min_score. It filters out any document where the score is lower than the min_score.
  • function_score has a max_boost. This doesn't filter documents returned, it simply caps the score to a specific value. For instance: if after calculating the score, you end up with 500, and the max_boost is 50, 50 will be returned.
  • If you don't want this function_score to pollute the overall score of the document, apply a boost of 0.

Cases

Find all persons have no children

Explanation: easiest query, you simply have to verify there are no nested documents. It is significantly faster than using the function_score.

{
    "query": {
        "bool": {
            "must_not": {
                "nested": {
                    "path": "children",
                    "query": {
                        "exists": {
                            "field": "children.first_name"
                        }
                    }
                }
            }    
        }
    }
}

Find all persons have a minimum of n children

Explanation: each matching document is boosted by 10 and the nested query sums them. The function_score filters out any document that is less than what is expected.

Example: Find all persons who have a minimum of 2 children: the boost applied here is 10 (you can set any number you want here), as such the min_score is 20 (2 * 10).

Before Elastic 7:

{
    "query": {
        "function_score": {
            "min_score": 20,
            "boost": 1,
            "query": {
                "nested": {
                    "path": "children",
                    "query": {
                        "exists": {
                            "field": "children.first_name"
                        }
                    },
                    "boost": 10,
                    "score_mode": "sum"
                }
            }
        }
    }
}

Elastic 7+:

{
    "query": {
        "function_score": {
            "min_score": 20,
            "boost": 1,
            "score_mode": "multiply",
            "boost_mode": "replace",
            "query": {
                "nested": {
                    "path": "children",
                    "boost": 10,
                    "score_mode": "sum",
                    "query": {
                        "constant_score": {
                            "boost": 1,
                            "filter": {
                                "exists": {
                                    "field": "children.first_name"
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

Find all persons who have between n and m children (n > 0).

Explanation: same as above, except here we apply a script to alter the score. This script checks that the sum of all boosts is not exceeding m * boost, if it does, it returns 0 which automatically guarantee the document will be excluded (0 < min_score).

Example: Find all persons who have a minimum of 2 children and a maximum of 5 children (inclusive).

Before Elastic 7:

{
    "query": {
        "function_score": {
            "min_score": 20,
            "boost": 1,
            "functions": {
                "script_score": {
                    "script": {
                        "source": "if (_score > 50) { return 0; } return _score;",
                        "lang": "painless"
                    }
                }
            },
            "query": {
                "nested": {
                    "path": "children",
                    "query": {
                        "exists": {
                            "field": "children.first_name"
                        }
                    },
                    "boost": 10,
                    "score_mode": "sum"
                }
            }
        }
    }
}

Elastic 7+:

{
    "query": {
        "function_score": {
            "min_score": 20,
            "boost": 1,
            "score_mode": "multiply",
            "boost_mode": "replace",
            "functions": [
                {
                    "filter": {
                        "match_all": {
                            "boost": 1
                        }
                    },
                    "script_score": {
                        "filter": {
                            "match_all": {
                                "boost": 1
                            }
                        },
                        "script": {
                            "source": "if (_score > 50) { return 0; } return _score;",
                            "lang": "painless"
                        }
                    }
                }
            ],
            "query": {
                "nested": {
                    "path": "children",
                    "boost": 10,
                    "score_mode": "sum",
                    "query": {
                        "constant_score": {
                            "boost": 1,
                            "filter": {
                                "exists": {
                                    "field": "children.first_name"
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

Find all persons who have less than n children (n > 0).

Explanation: this request means you are asking for persons who have no children and persons who have been 1 to n children. Expressing this with elastic can be tricky, so taking the negation makes it easier: you're asking to not find all persons who have more than n + 1 children.

Example: find all persons who have less than 2 children.

Before Elastic 7:

{
    "query": {
        "bool": {
            "must_not": {
                "function_score": {
                    "min_score": 30,
                    "boost": 1,
                    "query": {
                        "nested": {
                            "path": "children",
                            "query": {
                                "exists": {
                                    "field": "children.first_name"
                                }
                            },
                            "boost": 10,
                            "score_mode": "sum"
                        }
                    }
                }
            }
        }
    }
}

Elastic 7+:

{
    "query": {
        "bool": {
            "must_not": {
                "function_score": {
                    "min_score": 30,
                    "boost": 1,
                    "score_mode": "multiply",
                    "boost_mode": "replace",
                    "query": {
                        "nested": {
                            "path": "children",
                            "boost": 10,
                            "score_mode": "sum",
                            "query": {
                                "constant_score": {
                                    "boost": 1,
                                    "filter": {
                                        "exists": {
                                            "field": "children.first_name"
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

Hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment