Sample Elasticsearch queries in Python, as reference.

Use the Python client elasticsearch.

Connect to cluster (the client)

from elasticsearch import Elasticsearch

es_client = Elasticsearch()                  # local
es_client = Elasticsearch([<cluster_url>])  # remote

Prototype query

Build a body dictionary for the query.

body = {
	"from": 10,            # get docs from the number 10
    "size": 100,           # get 100 docs (default = 10)
    "fields": ["f_name1"], # get only wanted fields
	 "query": {            # the query
    "sort": {            # to sort
        "time_field": {
            "order": "desc"

NOTE: For filtering only some fields, use fields for fields which are explicitely marked in the mapping, _source otherwise.

How to query

A prototype search on a type in an index is run as

r ='my_index',

The result r is a dictionary again, whose keys will depend on the type of query run.

Results will be automatically sorted by relevance. In an aggregation, will be sorted by number of documents.

  • number of documents is in r['hits']['total']
  • actual documents are in r['hits']['hits']
  • if fields is used, r['hits']['hits'][0]['fields']['f_name1'][0]
  • for an aggregation r['aggregations']['agg_name']['buckets']

Query body samples

term query

body = {
    "query": {
        "term": {       
            "my_field_name": "chosen_field_value"

If field 'my_field_name' is a dictionary itself, can query for one subfield as 'my_field_name.subfield'.

range query

body = {
	"query": {
    	"range": {
            "my_time_field": {
                "gte": start_date,
                "lt": final_date

start_date and final_date are datetime/date objects.

bool query for an AND

a AND b

body = {
    "query": {
        "bool": {
            "must": [
                    "term": {
                        "field1": "value1"
                    "term": {
                        "field2": "value2"

not a AND b

body = {
    "query": {
        "bool": {
            "must_not": [
            "must": {
                "term": {
                    "field_name1": field_value
aggregation (GROUP BY)

Use size in the aggregation to make sure the returned sum_other_doc_count is 0.

On one field:

body = {
    "size": 0,
    "aggs": {
        "my_agg_name": {
            "terms": {
            	"size": 100,
                "field": "field_name1"

On more fields (double GROUP BY):

body = {
    "size": 0,
    "aggs": {
        "agg_field1": {
            "terms": {
                "size": 100,
                "field1": "value1"
            "aggs": {
                "subagg_field2": {
                    "terms": {
                        "size": 100,
                        "field2": "value2"
Pseudo-random sampling

The seed string, when changed, will give differently sampled (scored) documents. If no seed is specified, the current time is used as seed.

body = {
    fields: ["field1", "field2"],
	query: {
		function_score : {
			query: {
				"my_field": "my_value"
			random_score : {
			    "seed": "the seed"
Custom query in selected analysed fields (and boosting them)

body = {
    "query": {
        "simple_query_string": {
            "fields": ['field1^3', 'field2'],
            "flags": "ALL",
            "default_operator": "AND",
            "analyzer": "snowball",
            "query": "my custom query"

The 3 means field is boosted 3 times.

SCROLL query

TODO use docs

