mohanmca/ElasticSearch

## ElasticSearch
ElasticSearch is SchemaFree, Lucene based, Massively scalable, Distributed.

Run as many time as you want. (Multinode)

$> bin/elasticsearch
$> curl -XGET localhost:9200/?pretty
$> curl "http://localhost:9200/_cluster/health?pretty=true"

$> PUT /index/type/id (database/Table/EntityID)
$> curl -XPUT /mysite/node/1 -d
{
 "nid": "1",
 "status": "1",
 "title": "Hello elasticsearch",
 "body": "First elasticsearch document"
}

$> GET /mysite/node/1
$> GET /mysite/node/1?fields=title,body
$> GET /mysite/node/1/_source


UPDATE = DELETE + PUT
$> PUT /mysite/node/1 -d
{
"status":"0"
}

DELETE /mysite/node/1

PUT /new_index -d '{
"settings" : {
"number_of_shards" : 3,
"number_of_replicas" : 2
}
}'

$> GET /_search
$> GET /index1,index2/_search
$> GET /myapp_*/type, entity_*/_search

Pagination
$> GET /_search?size=10&from=20

Query
$> POST /_search -d
{
"query": {
"match": "awesome"
}
}


$> POST /_search -d
{
"query": {
      "match" : {
          "title" : {
          "query" : "+awesome -poor",
          "boost" : 2.0,
        }
      }
  }
}


Core types
* string
* number
* date
* boolean

Indexed fields could be Term  (not-analyzed) or FullText (analyzed)

Query, Filter and Sort
> POST /_search -d
{
"query": {
  "filtered": {
      "query": {
          "match": { "title": "awesome" }
        },
      "filter": {
        "term": { "type": "article" }
        }
      }
    }
  "sort": {"date":"desc"}
}

Shard & Index
Each index has fixed number of shards (Index partioned based on some logic)
Shard is a single Lucene indexes manageable by Elasticsearch
Shards in turn have replicas (backups) and all of them are located in nodes and nodes can be grouped in clusters.
Each shard can have 0-many replicas, can be changed dynamically

Further notes...
Zen Discovery mechanism which has IP multicast and unicast methods. Using one of these methods it checks presence of other nodes uniting them together forming a cluster.


New node can directly ask a master node to get information about other nodes in the cluster.

By default ES node configured to have 5 shards with 1 replica each. It means that indexes will have 5 primary shards and it's reserved copies (replicas). In case if cluster has at least 2 nodes (that is >= 2) and one of them fails then the cluster will still contain the entire index because the second node has copies of shards from the first one.

If shards configured to have 2 replicas then ES guarantees data integrity even if 2 nodes fails (of course there should be more than 2 nodes in a cluster) and so on.

ES node has following type
Workhorse (Indexer)
Coordinator (Serves as a master, not to storeany data)
Search Load balancer - Responsible for ES REST interface

By default ES plays all the above 3 roles.

Optimum node.
Number of Shards = (Index Size / Max shard size)

ES supports versioning.
I can get a document, change it and then put it back in (referencing the version ID I fetched) and it will either index or fail (if the document has been modified in the interim)
	ElasticSearch is SchemaFree, Lucene based, Massively scalable, Distributed.

	Run as many time as you want. (Multinode)

	$> bin/elasticsearch
	$> curl -XGET localhost:9200/?pretty
	$> curl "http://localhost:9200/_cluster/health?pretty=true"

	$> PUT /index/type/id (database/Table/EntityID)
	$> curl -XPUT /mysite/node/1 -d
	{
	"nid": "1",
	"status": "1",
	"title": "Hello elasticsearch",
	"body": "First elasticsearch document"
	}

	$> GET /mysite/node/1
	$> GET /mysite/node/1?fields=title,body
	$> GET /mysite/node/1/_source


	UPDATE = DELETE + PUT
	$> PUT /mysite/node/1 -d
	{
	"status":"0"
	}

	DELETE /mysite/node/1

	PUT /new_index -d '{
	"settings" : {
	"number_of_shards" : 3,
	"number_of_replicas" : 2
	}
	}'

	$> GET /_search
	$> GET /index1,index2/_search
	$> GET /myapp_/type, entity_/_search

	Pagination
	$> GET /_search?size=10&from=20

	Query
	$> POST /_search -d
	{
	"query": {
	"match": "awesome"
	}
	}


	$> POST /_search -d
	{
	"query": {
	"match" : {
	"title" : {
	"query" : "+awesome -poor",
	"boost" : 2.0,
	}
	}
	}
	}


	Core types
	* string
	* number
	* date
	* boolean

	Indexed fields could be Term (not-analyzed) or FullText (analyzed)

	Query, Filter and Sort
	> POST /_search -d
	{
	"query": {
	"filtered": {
	"query": {
	"match": { "title": "awesome" }
	},
	"filter": {
	"term": { "type": "article" }
	}
	}
	}
	"sort": {"date":"desc"}
	}

	Shard & Index
	Each index has fixed number of shards (Index partioned based on some logic)
	Shard is a single Lucene indexes manageable by Elasticsearch
	Shards in turn have replicas (backups) and all of them are located in nodes and nodes can be grouped in clusters.
	Each shard can have 0-many replicas, can be changed dynamically

	Further notes...
	Zen Discovery mechanism which has IP multicast and unicast methods. Using one of these methods it checks presence of other nodes uniting them together forming a cluster.


	New node can directly ask a master node to get information about other nodes in the cluster.

	By default ES node configured to have 5 shards with 1 replica each. It means that indexes will have 5 primary shards and it's reserved copies (replicas). In case if cluster has at least 2 nodes (that is >= 2) and one of them fails then the cluster will still contain the entire index because the second node has copies of shards from the first one.

	If shards configured to have 2 replicas then ES guarantees data integrity even if 2 nodes fails (of course there should be more than 2 nodes in a cluster) and so on.

	ES node has following type
	Workhorse (Indexer)
	Coordinator (Serves as a master, not to storeany data)
	Search Load balancer - Responsible for ES REST interface

	By default ES plays all the above 3 roles.

	Optimum node.
	Number of Shards = (Index Size / Max shard size)

	ES supports versioning.
	I can get a document, change it and then put it back in (referencing the version ID I fetched) and it will either index or fail (if the document has been modified in the interim)