spaghetticode/03.queries.md

## 03.queries.md

      
    Raw
  

              03.queries.md
            
          
    -OR (better), if feasible:

whole phrases instead of words for context-awareness

Let's see some possible solutions.
This one will look for the three words in the body, but requires that only 2 of them are present:
GET /docs/doc/_search
{
  "query": {
    "match": {
      "body": {
        "query": "small fish cartoon",
        "minimum_should_match": 2
      }
    }
  }
}
You can also use percentages with minimum_should_match. The following example will require 2 words out of 3 to match:
GET /docs/doc/_search
{
  "query": {
    "match": {
      "body": {
        "query": "small fish facebook",
        "minimum_should_match": "66%"
      }
    }
  }
}
If you increase the percentage to 67 no document will match.
match_phrase

When you want to find words in the exact order you can use phrase matching. Vanilla phrase matching will find only records with the exact word order. This example yields no results as our "nemo" document contains "little fish named Nemo":
GET /docs/doc/_search
{
  "query": {
    "match": {
      "body": {
        "query": "little fish nemo",
        "type":  "phrase"
      }
    }
  }
}
You may not want 100% same wording. Adding the slop attribute will allow "fish little nemo", "little nemo fish", "little fish xxx nemo" results to be matched as well.
slop value represents how apart terms are allowed to be while still considering the document a match. Higher values will be more tolerant.
Now we get a result:
GET /docs/doc/_search
{
  "query": {
    "match": {
      "body": {
        "query": "little fish nemo",
        "type":  "phrase",
        "slop": 1
      }
    }
  }
}
This query can be written also this way:
GET /docs/doc/_search
{
  "query": {
    "match_phrase": {
      "body": {
        "query": "little fish nemo",
        "slop": 1
      }
    }
  }
}
Multiple fields queries

Use the multi_match attribute. This is a simple example:
GET /docs/doc/_search
{
  "query": {
    "multi_match": {
      "fields": ["title", "body", "descriptions"],
      "query": "nemo facebook twitter"
    }
  }
}
Query words are ORed by default, if you want to AND them you can add operator. Since query words now are ANDed then no result will be returned:
GET /docs/doc/_search
{
  "query": {
    "multi_match": {
      "fields": ["title", "body", "descriptions"],
      "query": "nemo facebook twitter",
      "operator": "and"
    }
  }
}
You can use wildcards to pick fields. Let's consider the following example, which doesn't use wildcards and yelds no result, as the title field is not analyzed with the english analizer:
GET /docs/doc/_search
{
  "query": {
    "multi_match": {
      "fields": ["title", "body"],
      "query": "find"
    }
  }
}
This query on the other hand uses a wildcard to include title.en field as well, so the query yields the usual "Finding Nemo" result:
GET /docs/doc/_search
{
  "query": {
    "multi_match": {
      "fields": ["title*", "body"],
      "query": "find"
    }
  }
}
Nice to know: the "_all" field

_all is a special field that gets populated at index time for each inserted records. It concatenates all the data contained in the fields into the single "_all" attribute. By default text is analyzed with the standard analyzer. It can be used for queries as well, as a quick & dirty substitute for multifield queries:
GET /docs/doc/_search
{
  "query": {
    "match": {
      "_all": "finding"
    }
  }
}
This field can be disabled in order to save disk space/ram or customized for special needs.
-result limit (start,count), if applicable

GET docs/doc/_search
{
  "from": 1,
  "size": 2
}
size is count, from is start
Here's a more descriptive query. The must_not filter will match all the currently indexed documents, we then pick 2 of them leaving out the first:
GET /docs/_search
{
  "query": {
    "filtered": {
      "filter": {
        "bool": {
          "must_not": {"term" : {"title": "facebook"}}
        }
      }
    }
  },
  "size": 2,
  "from": 1
}
-location + max. distance

First, let's add some geo data to the existing records. We're going to use partial updates to avoid retyping all documents:
POST /docs/doc/1/_update
{
  "doc": {
    "location": {
      "lat": 45.490946,
      "lon": 9.206543
    }
  }
}

POST /docs/doc/2/_update
{
  "doc": {
    "location": {
      "lat": 46.490946,
      "lon": 10.206543
    }
  }
}

POST /docs/doc/3/_update
{
  "doc": {
    "location": {
      "lat": 76.490946,
      "lon": 30.206543
    }
  }
}
Let's look for things within 100km distance:
GET /docs/doc/_search
{
  "query": {
    "filtered": {
      "filter": {
        "geo_distance": {
          "distance": "100km",
          "location": {
            "lat": 45,
            "lon": 10
          }
        }
      }
    }
  }
}
This is a filter, but unlike other filers is not cached. Why? Because "location" is very likely to change at each request, making caching worthless. You can still enable this kind of caching if "location" coordinates are consistent among your queries.
You can see result distances from given coordinates using sort:
GET /docs/doc/_search
{
  "query": {
    "filtered": {
      "filter": {
        "geo_distance": {
          "distance": "120km",
          "location": {
            "lat": 46,
            "lon": 10
          }
        }
      }
    }
  },
  "sort": [
    {
      "_geo_distance": {
        "location": {
          "lat":  46,
          "lon": 10
        },
        "order":         "asc",
        "unit":          "km",
        "distance_type": "plane"
      }
    }
  ]
}
You can use some geopoint to select results and another one for sorting purpose. order, unit should be self explanatory. distance_type is the algorythm used for calculations: plane is the fastest but quite inaccurate (but it's ok for big distances), sloppy_arc is the default, and arc is the slowest but most accurate.