Skip to content

Instantly share code, notes, and snippets.

@spaghetticode
Last active October 18, 2018 08:39
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save spaghetticode/20e31b9801aee4deb27e to your computer and use it in GitHub Desktop.
Save spaghetticode/20e31b9801aee4deb27e to your computer and use it in GitHub Desktop.
Queries continued - fuzzy search

Fuzzy query

Basic fuzzy queries:

GET /docs/doc/_search
{
  "query": {
    "fuzzy": {
      "body": "robot"
    }
  }
}

GET /docs/doc/_search
{
  "query": {
    "match": {
      "body": {
        "query": "robot",
        "fuzziness": "auto"
      }
    }
  }
}

You can use fuzzy or match query with explicit fuzziness attribute. Be warned that fuzzy query is not analyzed.

fuzzyness means how different the matched words can be from the one in the query (max edit distance). Your best option is to use AUTO, which automatically changes the value according to the words length. Values are usually 0, 1, 2. You can't go higher than 2.

This query yelds only the "Transformers" document in the results. If you set fuzzyness to 2 it will yield "Robin Hood" as well: robot has an edit distance of 2 from robin.

-fuzzy-match words(phrases) against any of title,body,keywords

GET /docs/doc/_search
{
  "query": {
    "multi_match": {
      "fields":  [ "title", "body", "keywords" ],
      "query":     "leetle nemmo fisch",
      "fuzziness": "auto",
      "operator": "and"
    }
  }
}

"fuzziness": "auto" is a default value. The default for operator is or, if you use "operator": "and" the matching will be more complete phrases oriented, just like you need. You cannot use match_phrase with fuzzy queries, so you have to fallback to this strategy.

You can use prefix_length to limit the initial characters that will not be fuzzied, making the query less expensive:

GET /docs/doc/_search
{
  "query": {
    "multi_match": {
      "fields":  [ "title*", "body", "keywords" ],
      "query":     "litle nemmo fisch",
      "fuzziness": "auto",
      "operator": "and",
      "prefix_length": 3
    }
  }
}

If you increase prefix_length to 4 there will be no matches with the example query (litl will not match litt, fisc will not match fish...)

-for each matching NOT word:

-discard containing entry

Let's get all the documents that don't have the title containing "nemo" and a body that don't contain "fish", using the fuzzied forms "nemmo" and "fisch" in the query:

GET /docs/doc/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
          "match": { 
            "title": {
              "query": "nemmo",
              "fuzziness": "auto"
            }
          }
        },
        {
          "match": { 
            "body": {
              "query": "fisch",
              "fuzziness": "auto"
            }
          }
        }
      ]
    }
  }
}

This is a filter. You can add as many must_not clauses as you want, just keep the following format:

{
  "match": { 
    "title": {
      "query": "nemmo",
      "fuzziness": "auto"
    }
  }
}

It's the usual match query, but with explicit fuzziness attribute to make it fuzzy.

-for each matching SHOULD word:

-depending on which field matched, use customizable weights

This query will give higher boost to documents where the match is in the title field:

GET /docs/doc/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": { 
            "keywords": {
              "query": "action",
              "fuzziness": "auto"
            }
          }
        },
        {
          "match": { 
            "body": {
              "query": "litle",
              "fuzziness": "auto",
              "boost": 5
            }
          }
        }
      ]
    }
  }
}

-for each matching MUST word:

-no-op

Not clear what "no-op" means, could you please elaborate?

Anyway, the basic must query is this:

GET /docs/doc/_search
{
  "query": {
    "bool": {
      "must": [
        { "fuzzy": { "title": "nemmo"} },
        { "fuzzy": { "body": "fisch"} }
      ]
    }
  }
}

Remember that fuzzy is not analyzed. It may work for you if you don't need analysis, but if you do you should use match + fuzzines as in the examples above.

-only include entries that are withing distance to specified location

Here I am enhancing the should fuzzy query with a geo_distance filter:

GET /docs/doc/_search
{
  "query": {
    "filtered": {
      "query": {
        "bool": {
          "should": [
            {
              "match": { 
                "keywords": {
                  "query": "action",
                  "fuzziness": "auto"
                }
              }
            },
            {
              "match": { 
                "body": {
                  "query": "litle",
                  "fuzziness": "auto"
                }
              }
            }
          ]
        }        
      },
      "filter": {
        "geo_distance": {
          "distance": "100km",
          "location": {
            "lat": 45,
            "lon": 10
          }
        }        
      }
    }
  }
}

The query without geo_distance filter was returning 2 results, now it picks only the one within the expected distance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment