danielpsf/ElasticSearch_definitive_guide_notes.md

## ElasticSearch_definitive_guide_notes.md

      
    Raw
  

              ElasticSearch_definitive_guide_notes.md
            
          
    Elastic Search’s definitive guide notes

Chapter 1. You know, for search


RESTful web service on top of Apache Lucene
Has many clients that either uses TrasportClientor HTTP Clients

TransportClientis scheduled to be removed on ElasticSearch 8.0


Has two kind of query mechanisms

Query string
Query DSL


Index could be interpreted as a SQL Database
To Index could be interpreted as the act of inserting data into an index

Example of commands to build an index, index data and then query it

HR has requested an employee directory for Megacorp that has to:

Enable data to contain multi value tags, numbers, and full text.
Retrieve the full details of any employee.
Allow structured search, such as finding employees over the age of 30.
Allow simple full-text search and more-complex phrase searches.
Return highlighted search snippets from the text in the matching documents.
Enable management to build analytic dashboards over the data.

Add an employee

PUT /megacorp/employee/1
{
    "first_name" : "John",
    "last_name" :  "Smith",
    "age" :        25,
    "about" :      "I love to go rock climbing",
    "interests": [ "sports", "music" ]
}

Fetch employee by id

GET /megacorp/employee/1

Fetch all employees

GET /megacorp/employee/_search

Fetch all employees filter by last name through query string

GET /megacorp/employee/_search?q=last_name:Smith

Fetch all employees filter by last name through query DSL

GET /megacorp/employee/_search
{
  "query": {
    "match": {
      "last_name": "Smith"
    }
  }
}

Fetch all employees filter by last name and above certain age

GET /megacorp/employee/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "last_name": "Smith"
          }
        }
      ],
      "filter": {
        "range": {
          "age": {
            "gte": 30
          }
        }
      }
    }
  }
}

Fetch all employees that either like rock or climbing or rock climbing

GET /megacorp/employee/_search
{
  "query": {
    "match": {
      "about": "rock climbing"
    }
  }
}

Fetch all employees that likes rock climbing

GET /megacorp/employee/_search
{
  "query": {
    "match_phrase": {
      "about": "rock climbing"
    }
  }
}

Fetch all employees that likes rock climbing plus the highlights of the findings

GET /megacorp/employee/_search
{
    "query" : {
        "match_phrase" : {
            "about" : "rock climbing"
        }
    },
    "highlight": {
        "fields" : {
            "about" : {}
        }
    }
}

Fetch all interest of all employees

GET /megacorp/employee/_search
{
  "aggs": {
    "all_interests": {
      "terms": {
        "field": "interests.keyword"
      }
    }
  }
}

Fetch average age per interest of all employees

GET /megacorp/employee/_search
{
  "aggs": {
    "all_interests": {
      "terms": {
        "field": "interests.keyword"
      },
      "aggs": {
        "avg_age": {
          "avg": {
            "field": "age"
          }
        }
      }
    }
  }
}

Chapter 2. Life Inside a Cluster

Scaling

Most databases (MySQL, PostgreSQL, MariaDB, Oracle, SQL Server, etc) benefits most of horizontal scaling and making vertical scaling could mean tweak the cliente application at least a little bit to make it work as expected on the oposite side, ElasticSearch is build from the ground up to be scalable and high available, which means your application doesn't need to handle any of the cumbersome tasks that normal databases requires to scale up/down (horizontally) or out/in (vertically).
Cluster

ElasticSearch node's can play several roles at once and a cluster means that there are/is node(s) under the same cluster.name property.
A master node is in charge of managing cluster-wide operations, such as creating or deliting an index, adding or removing a node from the cluster.
A master node is not in charge of document-level changes or search, which means that having one master node doesn't necessarily will cause a bottleneck.
Users can talk to any node in the cluster, including the master and every node knows where all the documents lives allowing them to forward the request directly to the nodes that hold the data. Whichever node picks up the request will handle the burden of gathering the response from node or nodes, holding the data and then returning the filnal response to the clients.
Health

Retrieving the cluster health is as easy as querying data.
GET /_cluster/health

Among the result below the most interesting is status
{
  "cluster_name": "elasticsearch",
  "status": "yellow",
  "timed_out": false,
  "number_of_nodes": 1,
  "number_of_data_nodes": 1,
  "active_primary_shards": 8,
  "active_shards": 8,
  "relocating_shards": 0,
  "initializing_shards": 0,
  "unassigned_shards": 5,
  "delayed_unassigned_shards": 0,
  "number_of_pending_tasks": 0,
  "number_of_in_flight_fetch": 0,
  "task_max_waiting_in_queue_millis": 0,
  "active_shards_percent_as_number": 61.53846153846154
}
Statuses


Status
Description


green
All primary and replica shards are active


yellow
All primary shards are active, but not all replica shards are active


red
Not all primary shards are active


Add an Index

An index is nothing more than a logical namespace that points to one or more physical shards.
A shards is a low-level worker unit that holds just a slice of all the data in the index. It also containsan single instance of Apache Lucene, meaning it is a complete search engine in its own.
The number of shards in an index is fixed at the time that an index is created and can be changed at any time.
To create an index without allowing ElastiSearch to assume the default configuration (five primary shards) you can use the command below
PUT /blogs
{
   "settings" : {
      "number_of_shards" : 3,
      "number_of_replicas" : 1
   }
}

To start a second node you need to edit {elasticsearch_home}/conf/elasticsearch.yml adding the property node.max_local_storage_nodes: INT_GREATER_THAN_1, as recomended in the documentation.
Status	Description
green	All primary and replica shards are active
yellow	All primary shards are active, but not all replica shards are active
red	Not all primary shards are active