Skip to content

Instantly share code, notes, and snippets.

@mohanmca
Last active July 11, 2016 08:42
Show Gist options
  • Save mohanmca/682cb6a363139a48dc44 to your computer and use it in GitHub Desktop.
Save mohanmca/682cb6a363139a48dc44 to your computer and use it in GitHub Desktop.
ElasticSearch
ElasticSearch is SchemaFree, Lucene based, Massively scalable, Distributed.
Run as many time as you want. (Multinode)
$> bin/elasticsearch
$> curl -XGET localhost:9200/?pretty
$> curl "http://localhost:9200/_cluster/health?pretty=true"
$> PUT /index/type/id (database/Table/EntityID)
$> curl -XPUT /mysite/node/1 -d
{
"nid": "1",
"status": "1",
"title": "Hello elasticsearch",
"body": "First elasticsearch document"
}
$> GET /mysite/node/1
$> GET /mysite/node/1?fields=title,body
$> GET /mysite/node/1/_source
UPDATE = DELETE + PUT
$> PUT /mysite/node/1 -d
{
"status":"0"
}
DELETE /mysite/node/1
PUT /new_index -d '{
"settings" : {
"number_of_shards" : 3,
"number_of_replicas" : 2
}
}'
$> GET /_search
$> GET /index1,index2/_search
$> GET /myapp_*/type, entity_*/_search
Pagination
$> GET /_search?size=10&from=20
Query
$> POST /_search -d
{
"query": {
"match": "awesome"
}
}
$> POST /_search -d
{
"query": {
"match" : {
"title" : {
"query" : "+awesome -poor",
"boost" : 2.0,
}
}
}
}
Core types
* string
* number
* date
* boolean
Indexed fields could be Term (not-analyzed) or FullText (analyzed)
Query, Filter and Sort
> POST /_search -d
{
"query": {
"filtered": {
"query": {
"match": { "title": "awesome" }
},
"filter": {
"term": { "type": "article" }
}
}
}
"sort": {"date":"desc"}
}
Shard & Index
Each index has fixed number of shards (Index partioned based on some logic)
Shard is a single Lucene indexes manageable by Elasticsearch
Shards in turn have replicas (backups) and all of them are located in nodes and nodes can be grouped in clusters.
Each shard can have 0-many replicas, can be changed dynamically
Further notes...
Zen Discovery mechanism which has IP multicast and unicast methods. Using one of these methods it checks presence of other nodes uniting them together forming a cluster.
New node can directly ask a master node to get information about other nodes in the cluster.
By default ES node configured to have 5 shards with 1 replica each. It means that indexes will have 5 primary shards and it's reserved copies (replicas). In case if cluster has at least 2 nodes (that is >= 2) and one of them fails then the cluster will still contain the entire index because the second node has copies of shards from the first one.
If shards configured to have 2 replicas then ES guarantees data integrity even if 2 nodes fails (of course there should be more than 2 nodes in a cluster) and so on.
ES node has following type
Workhorse (Indexer)
Coordinator (Serves as a master, not to storeany data)
Search Load balancer - Responsible for ES REST interface
By default ES plays all the above 3 roles.
Optimum node.
Number of Shards = (Index Size / Max shard size)
ES supports versioning.
I can get a document, change it and then put it back in (referencing the version ID I fetched) and it will either index or fail (if the document has been modified in the interim)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment