Let's see some possible solutions. This one will look for the three words in the body, but requires that only 2 of them are present:
GET /docs/doc/_search
{
"query": {
"match": {
"body": {
"query": "small fish cartoon",
"minimum_should_match": 2
}
}
}
}
You can also use percentages with minimum_should_match
. The following example will require 2 words out of 3 to match:
GET /docs/doc/_search
{
"query": {
"match": {
"body": {
"query": "small fish facebook",
"minimum_should_match": "66%"
}
}
}
}
If you increase the percentage to 67 no document will match.
When you want to find words in the exact order you can use phrase
matching. Vanilla phrase matching will find only records with the exact word order. This example yields no results as our "nemo" document contains "little fish named Nemo":
GET /docs/doc/_search
{
"query": {
"match": {
"body": {
"query": "little fish nemo",
"type": "phrase"
}
}
}
}
You may not want 100% same wording. Adding the slop
attribute will allow "fish little nemo", "little nemo fish", "little fish xxx nemo" results to be matched as well.
slop
value represents how apart terms are allowed to be while still considering the document a match. Higher values will be more tolerant.
Now we get a result:
GET /docs/doc/_search
{
"query": {
"match": {
"body": {
"query": "little fish nemo",
"type": "phrase",
"slop": 1
}
}
}
}
This query can be written also this way:
GET /docs/doc/_search
{
"query": {
"match_phrase": {
"body": {
"query": "little fish nemo",
"slop": 1
}
}
}
}
Use the multi_match
attribute. This is a simple example:
GET /docs/doc/_search
{
"query": {
"multi_match": {
"fields": ["title", "body", "descriptions"],
"query": "nemo facebook twitter"
}
}
}
Query words are OR
ed by default, if you want to AND
them you can add operator
. Since query words now are AND
ed then no result will be returned:
GET /docs/doc/_search
{
"query": {
"multi_match": {
"fields": ["title", "body", "descriptions"],
"query": "nemo facebook twitter",
"operator": "and"
}
}
}
You can use wildcards to pick fields. Let's consider the following example, which doesn't use wildcards and yelds no result, as the title
field is not analyzed with the english analizer
:
GET /docs/doc/_search
{
"query": {
"multi_match": {
"fields": ["title", "body"],
"query": "find"
}
}
}
This query on the other hand uses a wildcard to include title.en
field as well, so the query yields the usual "Finding Nemo" result:
GET /docs/doc/_search
{
"query": {
"multi_match": {
"fields": ["title*", "body"],
"query": "find"
}
}
}
_all
is a special field that gets populated at index time for each inserted records. It concatenates all the data contained in the fields into the single "_all" attribute. By default text is analyzed with the standard analyzer. It can be used for queries as well, as a quick & dirty substitute for multifield queries:
GET /docs/doc/_search
{
"query": {
"match": {
"_all": "finding"
}
}
}
This field can be disabled in order to save disk space/ram or customized for special needs.
GET docs/doc/_search
{
"from": 1,
"size": 2
}
size
is count, from
is start
Here's a more descriptive query. The must_not
filter will match all the currently indexed documents, we then pick 2 of them leaving out the first:
GET /docs/_search
{
"query": {
"filtered": {
"filter": {
"bool": {
"must_not": {"term" : {"title": "facebook"}}
}
}
}
},
"size": 2,
"from": 1
}
First, let's add some geo data to the existing records. We're going to use partial updates to avoid retyping all documents:
POST /docs/doc/1/_update
{
"doc": {
"location": {
"lat": 45.490946,
"lon": 9.206543
}
}
}
POST /docs/doc/2/_update
{
"doc": {
"location": {
"lat": 46.490946,
"lon": 10.206543
}
}
}
POST /docs/doc/3/_update
{
"doc": {
"location": {
"lat": 76.490946,
"lon": 30.206543
}
}
}
Let's look for things within 100km distance:
GET /docs/doc/_search
{
"query": {
"filtered": {
"filter": {
"geo_distance": {
"distance": "100km",
"location": {
"lat": 45,
"lon": 10
}
}
}
}
}
}
This is a filter, but unlike other filers is not cached. Why? Because "location" is very likely to change at each request, making caching worthless. You can still enable this kind of caching if "location" coordinates are consistent among your queries.
You can see result distances from given coordinates using sort
:
GET /docs/doc/_search
{
"query": {
"filtered": {
"filter": {
"geo_distance": {
"distance": "120km",
"location": {
"lat": 46,
"lon": 10
}
}
}
}
},
"sort": [
{
"_geo_distance": {
"location": {
"lat": 46,
"lon": 10
},
"order": "asc",
"unit": "km",
"distance_type": "plane"
}
}
]
}
You can use some geopoint to select results and another one for sorting purpose. order
, unit
should be self explanatory. distance_type
is the algorythm used for calculations: plane
is the fastest but quite inaccurate (but it's ok for big distances), sloppy_arc
is the default, and arc
is the slowest but most accurate.
Is it reasonable to filter locations in default mode, but sort in fast 'plane' mode, as shown in the example?
In the example I wanted to show how to change the precision level, it was not meant as some kind of optimization. But it could be. The speed improvement would be unnoticeable in my opinion if a faster mode is used at sort time, which operates on a selected resultset, while it could be more useful at query time where you still have to pick all the records.
I’d suggest to go with default/higher precision everywhere and then, if queries get slow or you want to improve performance, use a lower precision mode.