Skip to content

Instantly share code, notes, and snippets.

@andrewgross
Last active August 29, 2015 14:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save andrewgross/71dc366c8c3ee9534252 to your computer and use it in GitHub Desktop.
Save andrewgross/71dc366c8c3ee9534252 to your computer and use it in GitHub Desktop.

I was curious if there were any plans to update or modify the JSON query API in ES 2.0+?

While I find the API to very powerful, it is confusing to construct a valid request and requires special casing a lot of rules. I have some thoughts below on what I see as the current issues, and some suggestions to correct them. I don't intend for this to be a rant, just to provoke discussion. This is done purely from the point of view of constructing queries (not parsing them), and only for the JSON DSL query syntax for searching (not percolate or aggregators).

It is currently hard to construct small parts of a JSON query without knowing all of the elements involved. Looking at a simple query and a filtered query:

Simple Query:

{
    "query": {
        "match_all": {}
    }
}

Filtered Query:

{
  "query": {
    "filtered": {
      "filter": {
        "and": [
          {
            "term": {
              "foo": "bar"
            }
          }
        ]
      },
      "query": {
        "match_all": {}
      }
    }
  }
}

We see the syntax tree change so that the initial 'query' becomes nested and the root of the tree changes. Once we add a scoring function, it morphs even further.

Scored Query:

{
  "query": {
    "function_score": {
      "query": {
        "filtered": {
          "filter": {
            "and": [
              {
                "term": {
                  "foo": "bar"
                }
              }
            ]
          },
          "query": {
            "match_all": {}
          }
        }
      },
      "script_score": {
        "script": "result = 0.0 + 1.0;"
      },
      "boost_mode": "replace"
    }
  }
}

This follows some of the same rules (nested inside a new scope), however, not all of the changes get placed together. We have both a 'script_score' block and a 'boost_mode' section. This means that when I want to add scoring to my query I need to know my scoring block as well as the rest of the query tree so that I can properly place 'boost_mode'.

A simple(r) example. In a simple scored query, if I want to modify my 'match_all' block, my path becomes "query" -> "function_score" -> "query"

Once I add filtering to the query, the path changes, causing a broken query if I insert in the old location. "query" -> "function_score" -> "query" -> "filtered" -> "query"

It would be much simpler if I could define my scoring block, and throw it in to a query at a static path without worrying what else is in the query. This case is a simple illustration, but the JSON query DSL contains many instances, especially around cases like 'scoring' where using a single scoring block vs. multiple scoring functions radically changes the structure of the scoring section.

I understand that this was designed iteratively, and that the syntax will not be perfect of both parsing and construction. Now that the JSON query DSL seems to have a stable set of elements, it would be useful to set it up so that it can be written in a simple manner. A few considerations:

  1. When adding an element such as scoring, have it only modify elements below it in the tree (aside from its initial insertion point)
  2. Keep the root of the tree static and have the existence of a top level key modify behavior, instead of needing change the nesting of elements.
  3. Somehow stop nesting the term "query" all over the place, definitely the most confusing thing for new users in my experience. =D

Here is a proposed top level DSL example. It's incomplete and probably missing some things but useful as an illustration:

{
  "filter": {
    "and": [
      {
        "term": {
          "foo": "bar"
        }
      }
    ]
  },
  "query": {
    "match_all": {}
  },
  "scoring": {
      "script_score": {
        "script": "result = 0.0 + 1.0;"
      },
      "boost_mode": "replace"
  },
  "sort": [
    {
      "foo": {
        "order": "desc",
        "mode": "average"
      }
    }
  ]
}

Thanks for reading over this. I was unable to find a roadmap for prospective features, so if there are already plans to work on this feel free to disregard my comments.

Thanks, Andrew

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment