winadiw/Elastic REST Basics of Indexing.MD

## Elastic REST Basics of Indexing.MD

      
    Raw
  

              Elastic REST Basics of Indexing.MD
            
          
    Source: https://github.com/codingexplained/complete-guide-to-elasticsearch
Get cluster health verbose
GET /_cluster/health

Get the nodes verbose
GET /_cat/nodes?v

Get all indices verbose
GET /_cat/indices?v

Get all the shards verbose
GET /_cat/shards?v

Creating index
PUT /products
{
  "settings": {
    "number_of_shards": 2,
    "number_of_replicas": 2
  }
}

Creating products with random id
POST /products/_doc
{
  "name": "Coffee Maker",
  "price": 64,
  "in_stock": 10
}

Create products with specified id. THIS WILL REPLACE OLD DOCUMENT FOR ID 100!!
PUT /products/_doc/100
{
  "name": "Toaster",
  "price": 49,
  "in_stock": 4
}

Get the products of id
GET /products/_doc/100

Updating the documents, keep in mind the _doc/id/_update is for elastic version 6.x.x, version 7.x.x is _update/100
When updating, if the key is not found, it will add automatically //If key exists, use the new value
POST /products/_update/100
{
  "doc": {
    "in_stock": 3,
   "tags": ["electronics"]
  }
}

Scripted updates reduce stock value
POST /products/_update/100
{
  "script": {
    "source": "ctx._source.in_stock--"
  }
}

Scripted updates specifying value
POST /products/_update/100 
{
  "script": {
    "source": "ctx._source.in_stock=10"
  }
}

Scripted updates with params
POST /products/_update/100
{
  "script": {
    "source": "ctx._source.in_stock -= params.quantity",
    "params": {
      "quantity": 4
    }
  }
}

Upsert, to either run script or create docs if nothing found
POST /products/_update/101
{
  "script": {
    "source": "ctx._source.in_stock++"
  },
  "upsert": {
    "name": "Blender",
    "price": 399,
    "in_stock": 5
  }
}

Delete a product / document.
DELETE /products/_doc/100

Update documents only if _seq_no and _primary_term is correct, for Optimistic Concurrency Control. Else ERROR 409 (Conflict)
POST /products/_update/100?if_primary_term=2&if_seq_no=17
{
  "doc": {
    "in_stock": 123
  }
}

Update query, minus in_stock for all documents
POST /products/_update_by_query
{
  "script": {
    "source": "ctx._source.in_stock --"
  },
  "query": {
    "match_all": {}
  }
}

Delete by query
POST /products/_delete_by_query
{
  "query": {
    "match_all": {}
  }
}

BULK API Create, using x-ndjson
POST /_bulk
{ "index": { "_index": "products", "_id": 200 } }
{ "name": "Espresso Machine", "price": 199, "in_stock": 5 }
{ "create": { "_index": "products", "_id": 201 } }
{ "name": "Milk", "price": 149, "in_stock": 14 }

BULK API Update/Delete
POST /_bulk
{ "update": { "_index": "products", "_id": 201 } }
{ "doc":  {"price": 129 }}
{ "delete": { "_index": "products", "_id": 200 } }

POST /products/_bulk
{ "update": { "_id": 201 } }
{ "doc":  {"price": 129 }}
{ "delete": { "_id": 200 } }

cURL bulk insert with file example
curl -H "Content-Type: application/x-ndjson" -XPOST http://localhost:9200/products/_bulk --data-binary "@products-bulk.json"


## Elastic REST Joining Query.MD

      
    Raw
  

              Elastic REST Joining Query.MD
            
          
    Joining Query

Find employees within a department
GET /department/_search
{
  "query": {
    "nested" : {
      "path": "employees",
      "query": {
        "bool": {
          "must": [
            {
              "match": {
                "employees.position": "intern"
              }
            },
            {
              "term": {
                "employees.gender.keyword": "F"
              }
            }
          ]
        }
      }
    }
  }
}


## Elastic REST Mapping & Analysis.MD

      
    Raw
  

              Elastic REST Mapping & Analysis.MD
            
          
    Mapping & Analysis

Some example of analyzing a text, using default analyzer
POST _analyze
{
  "text": "2 guys walk into    a bar, but the third... DUCKS! :-)",
  "analyzer": "standard"
}

Same API, but specifying all manually
POST _analyze
{
  "text": "2 guys walk into    a bar, but the third... DUCKS! :-)",
  "char_filter": [],
  "tokenizer": "standard",
  "filter": ["lowercase"]
}

Using keyword analyzer to analyze with keyword data types
POST _analyze
{
  "text": "2 guys walk into    a bar, but the third... DUCKS! :-)",
  "analyzer": "keyword"
}

Array analyzer will combine the arrays into text
POST /_analyze
{
  "text": ["String are simple", "merged together."],
  "analyzer": "standard"
}

Explicit Mappings for a new index
PUT /reviews
{
  "mappings": {
    "properties": {
      "rating": { "type": "float" },
      "content": {"type": "text"},
      "product_id": {"type": "integer"},
      "author": {
        "properties": {
          "first_name": {"type": "text"},
          "last_name": {"type": "text"},
          "email": {"type": "keyword"}
        }
      }
    }
  }
}

This will throw error, because of email is of type keyword.
PUT /reviews/_doc/1
{
  "rating": "5.0",
  "content": "Outstanding course! Bo really taugh me",
  "product_id": 123,
  "created_at": "2015-03-27T13:07:41Z",
  "author": {
    "first_name": "John",
    "last_name": "Doe",
    "email":  { "test": "johndoe123@example.com"}
  }
}

Get current mapping for a given index
GET /reviews/_mapping

Get current mapping specific to the field, using . for objects
GET /reviews/_mapping/field/author.email

Using . notations for object child
PUT /reviews_dot_notation
{
  "mappings": {
    "properties": {
      "rating": { "type": "float" },
      "content": {"type": "text"},
      "product_id": {"type": "integer"},
      "author.first_name": { "type": "text"},
      "author.last_name": { "type": "text"},
      "author.email": { "type": "keyword"}
    }
  }
}

Add mappings to current index
PUT /reviews/_mapping
{
  "properties": {
    "created_at": {"type": "date"}
  }
}

Simple Reindex API to reindex from one source to another, e.g in a case when we have to migrate to a new index
POST /_reindex
{
  "source": {
    "index": "reviews"
  },
  "dest": {
    "index": "reviews_new"
  }
}

Reindex API with a script to change from int to string
POST /_reindex
{
  "source": {
    "index": "reviews"
  },
  "dest": {
    "index": "reviews_new"
  },
  "script": {
    "source": """ 
     if (ctx._source.product_id != null) {
       ctx._source.product_id = ctx._source.product_id.toString();
     }
    """
  }
}

Field aliases example, mapping content to comment. Rename a field without reindexing.
PUT /reviews/_mapping
{
  "properties": {
    "comment": {
      "type": "alias",
      "path": "content"
    }
  }
}

Multi Fields Mapping using fields
PUT /multi_field_test
{
  "mappings": {
    "properties": {
      "description": { "type": "text"},
      "ingredients": { "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

Create a custom Index Template for easier mappings. For example create one for access-logs-*. This index mapping is only applied to new index. API for update is the same
PUT /_template/access-logs
{
  "index_patterns": ["access-logs-*"], 
  "settings": {
    "number_of_replicas": 1,
    "number_of_shards": 2,
    "index.mapping.coerce": false
  }, 
  "mappings": {
    "properties": {
      "@timestamp": {
        "type": "date"
      },
      "url.original": {
        "type": "keyword"
      },
      "http.request.referrer": {
        "type": "keyword"
      },
      "http.response.status_code": {
        "type": "long"
      }
    }
  }
}

Strict dynamic mapping will block unmapped relation
dynamic: false will allow unknown fields, but will not be indexed
PUT /people
{
  "mappings": {
    "dynamic": "strict",
    "properties": {
      "first_name": {
        "type": "text"
      }
    }
  }
}

Dynamic templates to adjust dynamic mappings, in this example, the default of a round number is integer instead of long
PUT /dynamic_template_test
{
  "mappings": {
    "dynamic_templates": [
      {
        "integers": {
          "match_mapping_type": "long",
          "mapping": {
            "type": "integer"
          }
        }
      }
    ]
  }
}

Custom analyzer, that can remove html texts, stop words, and fold ascii text
PUT /analyzer_test
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": {
          "type": "custom",
          "char_filter": ["html_strip"],
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "stop",
            "asciifolding"
          ]
        }
      }
    }
  }
}

Override token filter for stop, to use danish language
PUT /analyzer_test
{
  "settings": {
    "analysis": {
      "filter": {
        "danish_stop": {
          "type": "stop",
          "stopword": "_danish_"
        }
      }, 
      "analyzer": {
        "my_custom_analyzer": {
          "type": "custom",
          "char_filter": ["html_strip"],
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "danish_stop",
            "asciifolding"
          ]
        }
      }
    }
  }
}

How to open/close an index, for example to add new analyzer
POST /analyzer_test/_open
POST /analyzer_test/_close

Update analyzer on existing index, which must be closed first.
PUT /analyzer_test/_settings
{
  "analysis": {
    "analyzer": {
      "my_second_analyzer": {
        "type": "custom",
        "char_filter": ["html_strip"],
        "tokenizer": "standard",
        "filter": [
          "lowercase",
          "stop",
          "asciifolding"
        ]
      }
    }
  }
}

Example of updating existing analyzer, MUST CLOSE INDEX FIRST! this case, stop words are removed to enable query stopwords.
PUT /analyzer_test/_settings
{
  "analysis": {
    "analyzer": {
      "my_custom_analyzer": {
        "type": "custom",
        "tokenizer": "standard",
        "char_filter": ["html_strip"],
        "filter": [
          "lowercase",
          "asciifolding"
        ]
      }
    }
  }
}

Update all documents with current analyzer, so that old documents will be reindexed. If not, updated analyzer will not be updated.
POST /analyzer_test/_update_by_query?conflicts=proceed


## Elastic REST Searching.MD

      
    Raw
  

              Elastic REST Searching.MD
            
          
    Searching

Leaf Query vs Compound Query
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html
Search all documents within an index
GET /products/_search?q=*

Search name that contains Lobster, or find tags that contains Meat. Can combine using AND / OR,
dont forget to encode into url using %20
GET /products/_search?q=name:Lobster
GET /products/_search?q=tags:Meat
GET /products/_search?q=tags:Meat AND name:Tuna

GET all products using Query DSL
GET /products/_search
{
  "query": {
    "match_all": {}
  }
}

using explain: true will return explanation which explains how the score is calculated
GET products/_search
{
  "query": {
    "term": {
      "name": {
        "value": "lobster"
      }
    }
  },
  "explain": "true"
}

Explain API to get result on given query
GET /products/_explain/1
{
  "query": {
    "term": {
      "name": "lobster"
    }
  }
}

Term level Query vs Full text level Query

Term level query search for exact values on the inverted index, so lobster is found, but Lobster is not because of inverted index will be lowercased, hence Lobster is not found.
Full text query goes through analysis, same as the inverted index. So searching for Lobster or lobster yield the same results.

GET /products/_search
{
  "query": {
    "term": {
      "name": {
        "value": "Lobster"
      }
    }
  }
}

GET /products/_search
{
  "query": {
    "match": {
      "name": "Lobster"
    }
  }
}

Term Query

Simple term query
GET /products/_search
{
  "query": {
    "term": {
      "is_active": {
        "value": true
      }
    }
  }
}

OR

GET /products/_search
{
  "query": {
    "term": {
      "is_active": true
    }
  }
}

Searching multiple terms, for example tags.keyword
GET /products/_search
{
  "query": {
    "terms": {
      "tags.keyword": [
      "Soup",
      "Cake"
      ]
    }
  }
}

Return all given ids
GET /products/_search
{
  "query": {
    "ids": {
      "values": [1, 2, 3]
    }
  }
}

Search product with in_stock between 0 and 5
GET /products/_search
{
  "query": {
    "range": {
      "in_stock": {
        "gte": 0,
        "lte": 5
      }
    }
  }
}

Search products date of created within range
GET /products/_search
{
  "query": {
    "range": {
      "created": {
        "gte": "2010/01/01",
        "lte": "2010/12/31"
      }
    }
  }
}

Search date using custom format
GET /products/_search
{
  "query": {
    "range": {
      "created": {
        "gte": "2010-01-01",
        "lte": "2010-12-31",
        "format": "yyyy-MM-dd"
      }
    }
  }
}

Search date with date math
https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#date-math
GET /products/_search
{
  "query": {
    "range": {
      "created": {
        "gte": "2010/01/01||-1y-1d"
      }
    }
  }
}

Rounding current date to nearest month, then substract by 3 years
GET /products/_search
{
  "query": {
    "range": {
      "created": {
        "gte": "now/M-3y"
      }
    }
  }
}

Match document with non-null values, for example find tags
GET /products/_search
{
  "query": {
   "exists": {
     "field": "tags"
   }
  }
}

Match fields with prefix, for example find all tags.keyword which prefixes with Vege
GET /products/_search
{
  "query": {
    "prefix": {
      "tags.keyword": {
        "value": "Vege"
      }
    }
  }
}

Wildcard query. Careful, can be slow!
GET /products/_search
{
  "query": {
    "wildcard": {
      "tags.keyword": {
        "value": "Veg*ble"
      }
    }
  }
}

? for single word
GET /products/_search
{
  "query": {
    "wildcard": {
      "tags.keyword": {
        "value": "Veget?ble"
      }
    }
  }
}

Using regex to find tags.keyword of Vegetable. Make sure the regex is efficient!
GET /products/_search
{
  "query": {
    "regexp": {
      "tags.keyword": "Vege[a-zA-Z]+ble"
    }
  }
}

Use regex to find name that contains number
GET /products/_search
{
  "query": {
    "regexp": {
      "name": "[0-9]+"
    }
  }
}

Full Text Query

Find title with given text, usually user search this. using by default OR operator
GET /recipes/_search
{
  "query": {
    "match": {
      "title": "Recipes with pasta or spaghetti"
    }
  }
}

Find title with given text, using AND operator
GET /recipes/_search
{
  "query": {
    "match": {
      "title": {
        "query": "Recipes with pasta or spaghetti",
        "operator": "and"
      }
    }
  }
}

Use match_phrase to follow word sequences e.g spaghetti puttanesca vs puttanesca spaghetti is different
GET /recipes/_search
{
  "query": {
    "match_phrase": {
      "title": "spaghetti puttanesca"
    }
  }
}

Searching multiple fields, for example find text pasta in title or description. If pasta exists in either fields, search will be returned.
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html#multi-match-types
GET /recipes/_search
{
  "query": {
    "multi_match": {
      "query": "pasta",
      "fields": [
        "title",
        "description"
      ]
    }
  }
}

Compound Query

Bool query to combine searches
GET /recipes/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "ingredients.name": "parmesan"
          }
        }
      ],
      "must_not": [
        {
          "match": {
            "ingredients.name": "tuna"
          }
        }
      ],
      "should": [
        {
          "match": {
            "ingredients.name": "parsley"
          }
        }
      ], 
      "filter": [
        {
          "range": {
            "preparation_time_minutes": {
              "lte": 15
            }
          }
        }
      ]
    }
  }
}

Named queries, for debugging results. Will return data like:
"matched_queries" : [
  "prep_time_filter",
  "parmesan_must",
  "parsley_should"
]

GET /recipes/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "ingredients.name": {
              "query": "parmesan",
              "_name": "parmesan_must"
            }
          }
        }
      ],
      "must_not": [
        {
          "match": {
            "ingredients.name":{
              "query": "tuna",
              "_name": "tuna_must_not"
            }
          }
        }
      ],
      "should": [
        {
          "match": {
            "ingredients.name": {
              "query": "parsley",
              "_name": "parsley_should"
            }
          }
        }
      ], 
      "filter": [
        {
          "range": {
            "preparation_time_minutes": {
              "lte": 15,
              "_name": "prep_time_filter"
            }
          }
        }
      ]
    }
  }
}