Skip to content

Instantly share code, notes, and snippets.

@nirev
Created September 15, 2020 13:06
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save nirev/d1f3228debe08e8cb9666d3b81016401 to your computer and use it in GitHub Desktop.
Save nirev/d1f3228debe08e8cb9666d3b81016401 to your computer and use it in GitHub Desktop.
Elasticsearch Data Streams

Elastic Search Storage

The idea is to use a Elastic Search Data Stream https://www.elastic.co/guide/en/elasticsearch/reference/current/data-streams.html

Important concepts

Data Streams

A data stream is a way to handle time-series data (such as webhook logs) that rolls over time.

A data stream is backed by:

  • an alias used for writing/searching (eg "webhooks_logs")
  • a set of hidden backing indexes that store data
  • a index template, that defines the mapping and fields used in each index
  • a rollover configuration that can delete old indexes, create new ones and modify which is the active writing index

Mapping

Mappings are the way to specify schema for indexed documents

important: make sure that _source metadata is not disabled https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-source-field.html

important: mappings don't have types since ES 7.x https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-types.html

Data Stream how to

Prerequisites:

  • Elasticsearch data streams are intended for time series data only. Each document indexed to a data stream must contain the @timestamp field. This field must be mapped as a date or date_nanos field data type.
  • Data streams are best suited for time-based, append-only use cases. If you frequently need to update or delete existing documents, we recommend using an index alias and an index template instead.

1. Create a Index Lifecycle Management policy

ILM can be used to automatically manage a data stream’s backing indices. For example: rotating indexes based on size or age.

https://www.elastic.co/guide/en/elasticsearch/reference/current/ilm-put-lifecycle.html https://www.elastic.co/guide/en/elasticsearch/reference/current/ilm-index-lifecycle.html

PUT /_ilm/policy/my-data-stream-policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_age": "1d",
            "max_size": "100GB"
          }
        }
      },
      "delete": {
        "min_age": "30d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

response:

{"acknowledged": true}

2. Create an index template for the data stream

A data stream uses an index template to configure its backing indices. A template for a data stream must specify:

  • One or more index patterns that match the name of the stream.
  • The mappings and settings for the stream’s backing indices.
  • That the template is used exclusively for data streams.
  • A priority for the template.
PUT /_index_template/my-data-stream-template
{
  "index_patterns": [ "my-data-stream*" ],
  "data_stream": { },
  "priority": 200,
  "template": {
    "mappings": {
      "properties": {
        "@timestamp": { "type": "date_nanos" }
      }
    },
    "settings": {
      "index.lifecycle.name": "my-data-stream-policy"
    }
  },
  "version": "external-version",
  "_meta": { "whatever": "you-want" }

}

3. Create the Data Stream

PUT /_data_stream/my-data-stream

After it's created, you can query the Data Stream params:

GET /_data_stream/my-data-stream

{
  "data_streams": [
    {
      "name": "my-data-stream",
      "timestamp_field": {
        "name": "@timestamp"
      },
      "indices": [
        {
          "index_name": ".ds-my-data-stream-000001",
          "index_uuid": "krR78LfvTOe6gr5dj2_1xQ"
        },
        {
          "index_name": ".ds-my-data-stream-000002",
          "index_uuid": "C6LWyNJHQWmA08aQGvqRkA"
        }
      ],
      "generation": 2,
      "status": "GREEN",
      "template": "my-data-stream-template",
      "ilm_policy": "my-data-stream-policy"
    }
  ]
}

4. Index documents to the Data Stream

You can add documents to a data stream using two types of indexing requests:

  • Individual indexing requests
  • Bulk indexing requests

Individual

PUT /my-data-stream/_create/{id}

{
  "@timestamp": "2020-12-07T11:06:07.000Z",
  "user": {
    "id": "8a4f500d"
  },
  "message": "Login successful"
}

Bulk

PUT /my-data-stream/_bulk?refresh
{"create":{ }}
{ "@timestamp": "2020-12-08T11:04:05.000Z", "user": { "id": "vlb44hny" }, "message": "Login attempt failed" }
{"create":{"_id": "3"}}
{ "@timestamp": "2020-12-08T11:06:07.000Z", "user": { "id": "8a4f500d" }, "message": "Login successful" }
{"create":{ }}
{ "@timestamp": "2020-12-09T11:07:08.000Z", "user": { "id": "l7gk7f82" }, "message": "Logout successful" }
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment