martinapugliese/ref_es_queries.md

## ref_es_queries.md

      
    Raw
  

              ref_es_queries.md
            
          
    Collection of sample Elasticsearch queries

Use the Python client elasticsearch.
Connect to cluster (the client)

from elasticsearch import Elasticsearch

es_client = Elasticsearch()                  # local
es_client = Elasticsearch([<cluster_url>])  # remote

Prototype query

Build a body dictionary for the query.
body = {
	"from": 10,            # get docs from the number 10
    "size": 100,           # get 100 docs (default = 10)
    "fields": ["f_name1"], # get only wanted fields
	 "query": {            # the query
	 },        
    "sort": {            # to sort
        "time_field": {
            "order": "desc"
        }
    }
}

NOTE: For filtering only some fields, use fields for fields which are explicitely marked in the mapping, _source otherwise.
How to query

A prototype search on a type in an index is run as
r = es_client.search(index='my_index',
                     doc_type='my_type',
                     body=body)

The result r is a dictionary again, whose keys will depend on the type of query run.
Results will be automatically sorted by relevance. In an aggregation, will be sorted by number of documents.

number of documents is in r['hits']['total']
actual documents are in r['hits']['hits']
if fields is used, r['hits']['hits'][0]['fields']['f_name1'][0]
for an aggregation r['aggregations']['agg_name']['buckets']

Query body samples

term query


body = {
    "query": {
        "term": {       
            "my_field_name": "chosen_field_value"
         }
     },
}

If field 'my_field_name' is a dictionary itself, can query for one subfield as 'my_field_name.subfield'.
range query


body = {
	"query": {
    	"range": {
            "my_time_field": {
                "gte": start_date,
                "lt": final_date
            }
        }
    }
}

start_date and final_date are datetime/date objects.
bool query for an AND


a AND b
body = {
    "query": {
        "bool": {
            "must": [
                {
                    "term": {
                        "field1": "value1"
                    }
                },
                {
                    "term": {
                        "field2": "value2"
                    }
                }
            ]
        }
    }
}

not a AND b
body = {
    "query": {
        "bool": {
            "must_not": [
            ],
            "must": {
                "term": {
                    "field_name1": field_value
                }
            }
        }
    },
}

aggregation (GROUP BY)


Use size in the aggregation to make sure the returned sum_other_doc_count is 0.
On one field:
body = {
    "size": 0,
    "aggs": {
        "my_agg_name": {
            "terms": {
            	"size": 100,
                "field": "field_name1"
            }
        }
    }
}

On more fields (double GROUP BY):
body = {
    "size": 0,
    "aggs": {
        "agg_field1": {
            "terms": {
                "size": 100,
                "field1": "value1"
            },
            "aggs": {
                "subagg_field2": {
                    "terms": {
                        "size": 100,
                        "field2": "value2"
                    }
                }
            }
        }
    }
}

Pseudo-random sampling


The seed string, when changed, will give differently sampled (scored) documents. If no seed is specified, the current time is used as seed.
body = {
    fields: ["field1", "field2"],
	query: {
		function_score : {
			query: {
				"my_field": "my_value"
			},
			random_score : {
			    "seed": "the seed"
			}
		}
	}
}

Custom query in selected analysed fields (and boosting them)


body = {
    "query": {
        "simple_query_string": {
            "fields": ['field1^3', 'field2'],
            "flags": "ALL",
            "default_operator": "AND",
            "analyzer": "snowball",
            "query": "my custom query"
	    }
	}
}

The 3 means field is boosted 3 times.
SCROLL query


TODO use docs