Skip to content

Instantly share code, notes, and snippets.

@spaghetticode
Last active October 18, 2018 08:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save spaghetticode/b73692f7d32564c8f8a9 to your computer and use it in GitHub Desktop.
Save spaghetticode/b73692f7d32564c8f8a9 to your computer and use it in GitHub Desktop.
Outputs

outputs (results+metadata):

-weight-sorted list of matches:

{(ID, weight/relevance, distance)}

By id:

GET /docs/doc/_search 
{
  "query": {
    "match_all" : {}
  },
  "sort": [
    {"_uid": {"order": "asc" }}
  ]
}

For sorting purposes the default _id field is mapped with _uid.

Regarding distance, there was already an example in the previous gist:

GET /docs/doc/_search
{
  "query": {
    "filtered": {
      "filter": {
        "geo_distance": {
          "distance": "120km",
          "location": {
            "lat": 46,
            "lon": 10
          }
        }
      }
    }
  },
  "sort": [
    {
      "_geo_distance": {
        "location": {
          "lat":  46,
          "lon": 10
        },
        "order":         "asc",
        "unit":          "km",
        "distance_type": "plane"
      }
    }
  ]
}

The default sorting is by relevance (the "_score" field, when present). We have already seen how to increase the weight of a field for scoring purposes. You can use the "boost" attribute. The default value for boost is 1 (no boosting), values > 1 increase the field relevance, while values < 1 decrease the field relevance.

This query with multiple match clauses will return all the 3 documents, but the negative boost for "transformers" will make it the last one:

GET /docs/doc/_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "title": "robin"}},
        { "match": { "title": "finding"}},
        { "match": {
          "title": {
            "query": "transformers", 
            "boost": 0.1
          }
        }
        }
      ]
    }
  }
}

On the other hand a positive boost value (5) will make "transfomers" the first result:

GET /docs/doc/_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "title": "robin"}},
        { "match": { "title": "finding"}},
        { "match": {
          "title": {
            "query": "transformers", 
            "boost": 5
          }
        }
        }
      ]
    }
  }
}

-word proposals

-correction: spelling correction based on data set (no dictionary)

You should use a term suggester. The URL is the same as for the search url, but we add the search_type=count param because we don't really care about the search results, so the output is cleaner.

GET /docs/doc/_search?search_type=count
{
  "suggest" : {
    "maybe-you-mean-from-title" : {
      "text" : "fund",
      "term" : {
        "field" : "title.en"
      }
    },
    "maybe-you-mean-from-body" : {
      "text" : "liitle",
      "term" : {
        "field" : "body"
      }
    }  
  }
}

The field "maybe-you-mean-from-title" is an arbitrary name for that suggester. You can have multiple suggesters like in the example. If the text is the same for all the suggesters you can extract it:

GET /docs/doc/_search?search_type=count
{
  "suggest" : {
    "text" : "fund",
    "maybe-you-mean-from-title" : {
      "term" : {
        "field" : "title.en"
      }
    },
    "maybe-you-mean-from-body" : {
      "term" : {
        "field" : "body"
      }
    }  
  }
}

The relevan part of the results is:

  "suggest": {
      "maybe-you-mean-from-title": [
         {
            "text": "fund",
            "offset": 0,
            "length": 4,
            "options": [
               {
                  "text": "find",
                  "score": 0.75,
                  "freq": 1
               }
            ]
         }
      ],
      "maybe-you-mean-from-body": [
         {
            "text": "liitle",
            "offset": 0,
            "length": 6,
            "options": [
               {
                  "text": "little",
                  "score": 0.8333333,
                  "freq": 1
               }
            ]
         }
      ]
   }

As you can see fund returned find as a suggestion, while liitle returned little.

-aggregation: find words which match e.g. 33-67% of result set

You should use minimum_should_match.

GET /docs/doc/_search
{
  "query": {
    "match": {
      "body": {
        "query": "fish facebook twitter google",
        "minimum_should_match": "32%"
      }
    }
  }
}

If you increase the minimum_should_match value above 32% you will get no match, because more than 1 word would be required to match, but only fish does

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment