Skip to content

Instantly share code, notes, and snippets.

@spaghetticode
Last active October 18, 2018 08:40
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save spaghetticode/0775fa71e4b187a24d85 to your computer and use it in GitHub Desktop.
Save spaghetticode/0775fa71e4b187a24d85 to your computer and use it in GitHub Desktop.
Query Interface

Search

I'm going to discuss the following part of your request:

d) search: search entries
        inputs:
          -entry set identifier
          -three word lists:
            -words that should match
            -words that must match
            -words that must NOT match

Empty query

Let's review the concept of index and type first. Relational databases have databases and tables, while Elasticsearch has indices and types. Roghly you can consider databases equivalent to indices and tables equivalent to types.

Just for inspection purpose, let's consider some of the most simple queries possible. This query list all the records in the "docs" index and "doc" type:

GET /docs/doc/_search

If you omit the type (or even the index) you will get more general results.

This will show all the records inside the docs index, no matter what type they are:

GET /docs/_search

This will show all the records inside all indices:

GET /_search

You can also limit the search to a group of different indices, if you ever need to. This will search all the records in the docs and pictures indices:

GET /docs,pictures/_search

-entry set identifier

Look for a record from its id:

GET /docs/doc/1

This is an explicit query, using the search API, but it can be written in many other ways, the point here is to demostrate that the search API is powerful, versatile and yes, maybe a little complicated for the unexperienced user:

GET /docs/doc/_search
{
  "query": {
    "filtered": {
      "filter": {
        "term": {
          "_id": 1
        }
      }
    }
  }
}

Elasticsearch has filters and regular queries. The main difference is that filters don't calculate documents relevance, but they are faster, they are cached and can be combined with regular queries.

You should use filters to restrict the scope of your search and then use the slower match query to extract the most relevant results.

The following example filters all the records that have the field title not blank (exists):

GET /docs/doc/_search
{
  "query": {
    "filtered": {
      "filter": { 
        "exists": {
          "field": "title"
        } 
      }
    }
  }
}

The beginning of the output is:

{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 3,
      "max_score": 1,
      "hits": [
         {
            "_index": "docs",
            "_type": "doc",
            "_id": "1",
            "_score": 1,
            "_source": {
               "title": "Transformers",
               "body": "a movie about big robots saving the planet",
               "keywords": "scifi, action, robots",
               "location": {
                  "lon": 9.206543,
                  "lat": 45.490946
               }
            }
         },

Whenever in the resultset you see all _score = 1 then you're using a filter (which happens to be the case here).

-three word lists:

-words that should match

This query requires that at least one of the terms twitter, facebook, finding is present. I'm assuming you want to search in the same field (title) here:

GET /docs/doc/_search
{
  "query": {
    "filtered": {
      "filter": {
        "bool": {
          "should": [
            { "term": { "title": "twitter"}},
            { "term": { "title": "facebook"}},
            { "term": { "title": "finding"}}
          ]
        }
      }
    }
  }
}

The above query is a filter. The same filter can be written as a regular query that shows relevance:

GET /docs/doc/_search
{
  "query": {
    "bool": {
      "should": [
        { "term": { "title": "twitter"}},
        { "term": { "title": "facebook"}},
        { "term": { "title": "finding"}}
      ]
    }
  }
}

The output now shows a real value for _score:

   "hits": {
      "total": 1,
      "max_score": 0.013555458,
      "hits": [
         {
            "_index": "docs",
            "_type": "doc",
            "_id": "2",
            "_score": 0.013555458,

We can write the same query using match. Keep in mind that match is the keyword for fulltext search, still calculating relevance:

GET /docs/doc/_search
{
  "query": {
    "bool": {
      "should": { 
        "match": { "title": "twitter facebook finding" } 
      } 
    }
  }
}

You can consider should equivalent to OR in boolean logic.

## -words that must match

GET /docs/doc/_search
{
  "query": {
    "bool": {
      "must": [
        { "term": { "title": "twitter"}},
        { "term": { "title": "facebook"}},
        { "term": { "title": "finding"}}
      ]
    }
  }
}

Here we get no result, because all the 3 terms must be found in the same record in order to have a match. must maps to AND in boolean logic.

## -words that must NOT match

Pretty self explanatory:

GET /docs/doc/_search
{
  "query": {
    "filtered": {
      "filter": {
        "bool": {
          "must_not": {
            "terms": {
              "body": ["robots", "facebook", "twitter"]
            }
          }
        }
      }
    }
  }
}

must_not maps to NOT in boolean logic.

The query above yields 2 results (Robin Hood and Finding Nemo). Again, this is a filter query. In order to show how to combine a filter with a match query let's restrict the results to the records with title matching "nemo":

GET /docs/doc/_search
{
  "query": {
    "filtered": {
      "filter": {
        "bool": {
          "must_not": {
            "terms": {
              "body": ["robots", "facebook", "twitter"]
            }
          }
        }
      },
      "query": {
        "match": { "title": "nemo" }
      }
    }
  }
}

And now we have relevance with _score:

{
   "took": 0,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.19178301,
      "hits": [
         {
            "_index": "docs",
            "_type": "doc",
            "_id": "2",
            "_score": 0.19178301,
            "_source": {
               "title": "Finding Nemo",
               "body": "A beautyful cartoon about a little fish named Nemo",
               "keywords": "kids, cartoons, pixar",
               "location": {
                  "lon": 10.206543,
                  "lat": 46.490946
               }
            }
         }
      ]
   }
}

Looking at the examples it should be clear that should, must and must_not blocks must be enclosed inside a bool attribute.

Of course you can combine should, must and must_not blocks in order to build the desired query:

GET /docs/doc/_search
{
  "query": {
    "bool": {
      "must_not": {
        "terms": {
          "body": ["robots", "facebook", "twitter"]
        }
      },
      "must": {
        "match": {"title": "robin"}
      }
    }
  }
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment