spaghetticode/05.outputs.md

## 05.outputs.md

      
    Raw
  

              05.outputs.md
            
          
    outputs (results+metadata):

-weight-sorted list of matches:

{(ID, weight/relevance, distance)}

By id:
GET /docs/doc/_search 
{
  "query": {
    "match_all" : {}
  },
  "sort": [
    {"_uid": {"order": "asc" }}
  ]
}
For sorting purposes the default _id field is mapped with _uid.
Regarding distance, there was already an example in the previous gist:
GET /docs/doc/_search
{
  "query": {
    "filtered": {
      "filter": {
        "geo_distance": {
          "distance": "120km",
          "location": {
            "lat": 46,
            "lon": 10
          }
        }
      }
    }
  },
  "sort": [
    {
      "_geo_distance": {
        "location": {
          "lat":  46,
          "lon": 10
        },
        "order":         "asc",
        "unit":          "km",
        "distance_type": "plane"
      }
    }
  ]
}
The default sorting is by relevance (the "_score" field, when present). We have already seen how to increase the weight of a field for scoring purposes. You can use the "boost" attribute. The default value for boost is 1 (no boosting), values > 1 increase the field relevance, while values < 1 decrease the field relevance.
This query with multiple match clauses will return all the 3 documents, but the negative boost for "transformers" will make it the last one:
GET /docs/doc/_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "title": "robin"}},
        { "match": { "title": "finding"}},
        { "match": {
          "title": {
            "query": "transformers", 
            "boost": 0.1
          }
        }
        }
      ]
    }
  }
}
On the other hand a positive boost value (5) will make "transfomers" the first result:
GET /docs/doc/_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "title": "robin"}},
        { "match": { "title": "finding"}},
        { "match": {
          "title": {
            "query": "transformers", 
            "boost": 5
          }
        }
        }
      ]
    }
  }
}
-word proposals

-correction: spelling correction based on data set (no dictionary)

You should use a term suggester. The URL is the same as for the search url, but we add the search_type=count param because we don't really care about the search results, so the output is cleaner.
GET /docs/doc/_search?search_type=count
{
  "suggest" : {
    "maybe-you-mean-from-title" : {
      "text" : "fund",
      "term" : {
        "field" : "title.en"
      }
    },
    "maybe-you-mean-from-body" : {
      "text" : "liitle",
      "term" : {
        "field" : "body"
      }
    }  
  }
}
The field "maybe-you-mean-from-title" is an arbitrary name for that suggester. You can have multiple suggesters like in the example. If the text is the same for all the suggesters you can extract it:
GET /docs/doc/_search?search_type=count
{
  "suggest" : {
    "text" : "fund",
    "maybe-you-mean-from-title" : {
      "term" : {
        "field" : "title.en"
      }
    },
    "maybe-you-mean-from-body" : {
      "term" : {
        "field" : "body"
      }
    }  
  }
}
The relevan part of the results is:
  "suggest": {
      "maybe-you-mean-from-title": [
         {
            "text": "fund",
            "offset": 0,
            "length": 4,
            "options": [
               {
                  "text": "find",
                  "score": 0.75,
                  "freq": 1
               }
            ]
         }
      ],
      "maybe-you-mean-from-body": [
         {
            "text": "liitle",
            "offset": 0,
            "length": 6,
            "options": [
               {
                  "text": "little",
                  "score": 0.8333333,
                  "freq": 1
               }
            ]
         }
      ]
   }
As you can see fund returned find as a suggestion, while liitle returned little.
-aggregation: find words which match e.g. 33-67% of result set

You should use minimum_should_match.
GET /docs/doc/_search
{
  "query": {
    "match": {
      "body": {
        "query": "fish facebook twitter google",
        "minimum_should_match": "32%"
      }
    }
  }
}
If you increase the minimum_should_match value above 32% you will get no match, because more than 1 word would be required to match, but only fish does