Skip to content

Instantly share code, notes, and snippets.

@tokhi
Last active January 6, 2017 00:44
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save tokhi/273e16e601708085c93b to your computer and use it in GitHub Desktop.
Save tokhi/273e16e601708085c93b to your computer and use it in GitHub Desktop.
Elastic Search in simple words

Elastic Search

Elasticsearch is a real time search engine where a change to an index will be propegated to the whole cluster within a second.

An elasticsearch cluster indicated as one or more nodes, collection of nodes containing all the data, default cluster name is elasticserach.

A node is a single server and part of a cluster, node participate in searching and indexing.

Index is collection of documents equavalent to a database within a relational system, index name must be lowercase Type is represetn a class = table

mapping = schema of a table

document = row in a table

An index is divided in multiple pieces each piece is called a shard.

  • useful when an index contains more data than the hard drive of a node can store (e.g; 1 TB data on 500 GB hard disk) A shard is a fully functional and independent index

Shard can be stored on any node in a cluster.

Default number of shards is 5, and there is a replica for each primary shard

Shard allowes to distribute and parallelize operations across shards which increases performance.

Shards improve scalability

Replica is a copy of a shard

Replica nerve resides on the same node as the original shard (e.g; if a given node fails the replica is still available)

How it works

  • search request hits a node
  • Node broadcast to every shard in the index (primary & replica)
  • Each shard performs query
  • Each shard returns result
  • Result merged, sorted and return to client

CRUD

PUT /ecommerce
{
    
}
DELETE /ecommerce
GET /_cat/indices?v
# creats the product table
PUT /ecommerce
{
    "mappings": {
        "product": {
            "properties": {
                "name": {
                    "type": "string"
                },
                "price": {
                    "type": "double"
                },
                "descirption": {
                    "type": "string"
                },
                "status": {
                    "type": "string"
                },
                "quality": {
                    "type": "integer"
                },
                "categories": {
                    "type": "nested",
                    "properties": {
                        "name": {
                            "type": "string"
                        }
                    }
                },
                "tags": {
                    "type": "string"
                }
            }
        }
    }
}
# insert a record
PUT /ecommerce/product/1001
{
    "name": "rails framework from beginer to professional"
    ,
    "price": 30.00,
    "description": "Learn rails framework in just few hours",
    "status": "a'ctive"
    ,
    "quantity": 1,
    "categories": [
        {"name": "software"}
        ],
        "tags": ["rails framework", "ror1", "ruby","programming"]
}
PUT /ecommerce/product/1001
{
    "name": "rails framework from beginer to professional"
    ,
    "price": 40.00,
    "description": "Learn rails framework in just few hours",
    "status": "active"
    ,
    "quantity": 1,
    "categories": [
        {"name": "software"}
        ],
        "tags": ["rails framework", "ror1", "ruby","programming"]
}
POST /ecommerce/product/1001/_update
{
    "doc": {
        "price": 50.00
    }
}
DELETE /ecommerce/product/1001
# insert set of records using bulk
POST /ecommerce/product/_bulk
{"index":{"_id":"1002"}}
{"name":"Why elasticsearch is Awesome","price":"50.00","description":"A book about elasticsearch!","status":"active","quantity":10,"categories":[{"name":"Software"}],"tags":["elasticsearch","programming"]}
{"index":{"_id":"1003"}}
{"name":"Dark chocolate","price":4.00,"description":"Yummy dark chocolate.","status":"active","quantity":100,"categories":[{"name":"chocolate"}],"tags":["chocolate"]}
# executing different actions using bulk
POST /ecommerce/product/_bulk
{"delete" : {"_id": "1" }}
{"update" : {"_id": "1002" }}
{"doc" : {"quantity": "9" }}
GET /ecommerce/products/1
POST /_bulk
{"update" : {"_id": "1002", "_index":"ecommerce", "_type": "product"}}
{"doc" : {"quantity" : 8 }}

Searching queries

# get a specific product
GET /ecommerce/product/1002
# search for all products
GET /ecommerce/product/_search?q=*
GET /ecommerce/product/_search?q=chocolate
# search the name attributes that contain 'Awesome'
GET /ecommerce/product/_search?q=name:Awesome
GET /ecommerce/product/_search?q=name:foobar
# name field should have both keywords
GET /ecommerce/product/_search?q=name:(chocolate AND dark)
GET /ecommerce/product/_search?q=name:(framework OR professional)
# contains at least one of them and the status should be active
GET /ecommerce/product/_search?q=(name:(framework OR professional) AND status:active)
# name field should have contains 'framework' but not 'rails'
GET /ecommerce/product/_search?q=name:+framework -rails
# this will give us the result that have keywords 'from' and 'framework' here 'framework' is not guaranteed to be in all the results
GET /ecommerce/product/_search?q=name:from framwork

# searching for a specific sentence (order matters)
GET /ecommerce/product/_search?q=name:"framework from"
# still search work hiphen get dropped
GET /ecommerce/product/_search?q=name:"framework - from"
# special characters get ingnored from the search as the analyzer shows
GET /_analyze?analyzer=standard&text=framework - from

Elasticsearch Aggregations

Sum, min, max and stats aggregations:

match_all:

# to sum the quantities of all products
GET /ecommerce/product/_search
{
    "query": {
        "match_all": {}
    },
    "size": 0,
    "aggs": {
        "quantity_sum": {
            "sum": {
                "field": "quantity"
            }
        }
    }
}

To aggregate using match:

GET /ecommerce/product/_search
{
    "query": {
        "match": {
            "name": {
                "query": "chocolate"
            }
        }
    },
    "size": 0,
    "aggs": {
        "quantity_sum": {
            "sum": {
                "field": "quantity"
            }
        }
    }
}
# Gets the average of documents
GET /ecommerce/product/_search
{
    "query": {
        "match": {
            "name": {
                "query": "chocolate"
            }
        }
    },
    "size": 0,
    "aggs": {
        "quantity_avg": {
            "avg": {
                "field": "quantity"
            }
        }
    }
}
# max and min aggregation
GET /ecommerce/product/_search
{
    "query": {
        "match": {
            "name": {
                "query": "car"
            }
        }
    },
    "size": 0,
    "aggs": {
        "max_quantity": {
            "max": {
                "field": "quantity"
            }
        }
    }
}
# stats aggregation (count, max, min , avg, sum)
GET /ecommerce/product/_search
{
    "query": {
        "match": {
            "name": {
                "query": "car"
            }
        }
    },
    "size": 0,
    "aggs": {
        "quantity_stat": {
            "stats": {
                "field": "quantity"
            }
        }
    }
}

Bucket aggergation:

This will give us the number of documents in range 50 and range 100 (bucket aggregation)

GET /ecommerce/product/_search
{
    "query": {
        "match_all": {}
    },
    "size": 0,
    "aggs": {
        "quantity_ranges": {
            "range": {
                "field": "quantity",
                "ranges": [
                    {
                       "from": 1,
                        "to": 50   
                    },
                    {
                        "from": 50,
                        "to": 100
                    }
                 ]
            }
        }
    }
}

Nested aggregations:

Stats on the document within each bucket, this can be done using a sub aggregation ( the second aggretaion will operate on each of the ranges within the parrent aggregation) - there is no limit at nesting aggregations but it will effect the performance the more you add.

GET /ecommerce/product/_search
{
    "query": {
        "match_all": {}
    },
    "size": 0,
    "aggs": {
        "quantity_ranges": {
            "range": {
                "field": "quantity",
                "ranges": [
                    {
                       "from": 1,
                        "to": 50   
                    },
                    {
                        "from": 50,
                        "to": 100
                    }
                 ]
            },
            "aggs": {
                "quantity_stats": {
                    "stats": {
                        "field": "quantity"
                    }
                }
            }
        }
    }
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment