- RESTful web service on top of Apache Lucene
- Has many clients that either uses
TrasportClient
orHTTP Clients
TransportClient
is scheduled to be removed on ElasticSearch 8.0
- Has two kind of query mechanisms
- Query string
- Query DSL
- Index could be interpreted as a SQL Database
- To Index could be interpreted as the act of inserting data into an index
HR has requested an employee directory for Megacorp that has to:
- Enable data to contain multi value tags, numbers, and full text.
- Retrieve the full details of any employee.
- Allow structured search, such as finding employees over the age of 30.
- Allow simple full-text search and more-complex phrase searches.
- Return highlighted search snippets from the text in the matching documents.
- Enable management to build analytic dashboards over the data.
PUT /megacorp/employee/1
{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
GET /megacorp/employee/1
GET /megacorp/employee/_search
GET /megacorp/employee/_search?q=last_name:Smith
GET /megacorp/employee/_search
{
"query": {
"match": {
"last_name": "Smith"
}
}
}
GET /megacorp/employee/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"last_name": "Smith"
}
}
],
"filter": {
"range": {
"age": {
"gte": 30
}
}
}
}
}
}
GET /megacorp/employee/_search
{
"query": {
"match": {
"about": "rock climbing"
}
}
}
GET /megacorp/employee/_search
{
"query": {
"match_phrase": {
"about": "rock climbing"
}
}
}
GET /megacorp/employee/_search
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
},
"highlight": {
"fields" : {
"about" : {}
}
}
}
GET /megacorp/employee/_search
{
"aggs": {
"all_interests": {
"terms": {
"field": "interests.keyword"
}
}
}
}
GET /megacorp/employee/_search
{
"aggs": {
"all_interests": {
"terms": {
"field": "interests.keyword"
},
"aggs": {
"avg_age": {
"avg": {
"field": "age"
}
}
}
}
}
}
Most databases (MySQL, PostgreSQL, MariaDB, Oracle, SQL Server, etc) benefits most of horizontal scaling and making vertical scaling could mean tweak the cliente application at least a little bit to make it work as expected on the oposite side, ElasticSearch is build from the ground up to be scalable and high available, which means your application doesn't need to handle any of the cumbersome tasks that normal databases requires to scale up/down (horizontally) or out/in (vertically).
ElasticSearch node's can play several roles at once and a cluster means that there are/is node(s) under the same cluster.name
property.
A master node is in charge of managing cluster-wide operations, such as creating or deliting an index, adding or removing a node from the cluster.
A master node is not in charge of document-level changes or search, which means that having one master node doesn't necessarily will cause a bottleneck.
Users can talk to any node in the cluster, including the master and every node knows where all the documents lives allowing them to forward the request directly to the nodes that hold the data. Whichever node picks up the request will handle the burden of gathering the response from node or nodes, holding the data and then returning the filnal response to the clients.
Retrieving the cluster health is as easy as querying data.
GET /_cluster/health
Among the result below the most interesting is status
{
"cluster_name": "elasticsearch",
"status": "yellow",
"timed_out": false,
"number_of_nodes": 1,
"number_of_data_nodes": 1,
"active_primary_shards": 8,
"active_shards": 8,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 5,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 0,
"active_shards_percent_as_number": 61.53846153846154
}
Status | Description |
---|---|
green | All primary and replica shards are active |
yellow | All primary shards are active, but not all replica shards are active |
red | Not all primary shards are active |
An index is nothing more than a logical namespace that points to one or more physical shards.
A shards is a low-level worker unit that holds just a slice of all the data in the index. It also containsan single instance of Apache Lucene, meaning it is a complete search engine in its own.
The number of shards in an index is fixed at the time that an index is created and can be changed at any time.
To create an index without allowing ElastiSearch to assume the default configuration (five primary shards) you can use the command below
PUT /blogs
{
"settings" : {
"number_of_shards" : 3,
"number_of_replicas" : 1
}
}
To start a second node you need to edit {elasticsearch_home}/conf/elasticsearch.yml
adding the property node.max_local_storage_nodes: INT_GREATER_THAN_1
, as recomended in the documentation.