sbsatter/Filebeat

## elasticsearch
Resources
+++++++++
https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html - elasticsearch docs
https://www.elastic.co/guide/en/logstash/current/index.html - logstash docs
https://www.elastic.co/guide/en/kibana/current/index.html - kibana docs
https://stackoverflow.com/questions/15694724/shards-and-replicas-in-elasticsearch - shards and replicas
https://blog.insightdatascience.com/anatomy-of-an-elasticsearch-cluster-part-i-7ac9a13b05db - analogies and architectures


Near Realtime (NRT)
===================
Elasticsearch is a near real time search platform. What this means is there is a slight latency (normally one second) from the time you index a document until the time it becomes searchable.

Cluster
=======
A cluster is a collection of one or more nodes (servers) that together holds your entire data and provides federated indexing and search capabilities across all nodes.
Default cluster is elasticsearch (created automatically).

Node
====
A node is a single server that is part of your cluster, stores your data, and participates in the cluster’s indexing and search capabilities.
name is important for administration purposes where you want to identify which servers in your network correspond to which nodes in your Elasticsearch cluster.

Index
=====
An index is a collection of documents that have somewhat similar characteristics. For example, you can have an index for customer data, another index for a product catalog, and yet another index for order data.

Type
====
Within an index, you can define one or more types. A type is a logical category/partition of your index whose semantics is completely up to you. In general, a type is defined for documents that have a set of common fields. For example, let’s assume you run a blogging platform and store all your data in a single index. In this index, you may define a type for user data, another type for blog data, and yet another type for comments data.

Document
========
A document is a basic unit of information that can be indexed. For example, you can have a document for a single customer, another document for a single product, and yet another for a single order.

Shard
=====
Elasticsearch provides the ability to subdivide your index into multiple pieces called shards. When you create an index, you can simply define the number of shards that you want. Each shard is in itself a fully-functional and independent "index" that can be hosted on any node in the cluster.

Sharding is important for two primary reasons:

    It allows you to horizontally split/scale your content volume
    It allows you to distribute and parallelize operations across shards (potentially on multiple nodes) thus increasing performance/throughput

Replica
=======
Elasticsearch allows you to make one or more copies of your index’s shards into what are called replica shards, or replicas for short.


Note
====
Each Elasticsearch shard is a Lucene index. There is a maximum number of documents you can have in a single Lucene index. As of LUCENE-5843, the limit is 2,147,483,519 (= Integer.MAX_VALUE - 128) documents. You can monitor shard sizes using the _cat/shards API.


Download And Installation
=========================
1. check jdk version (1.8.0_131 preferable)
2. Download from http://www.elastic.co/downloads
3. unzip tar file
	tar -xvf elasticsearch-5.5.2.tar.gz
4. Run as ./elasticsearch
	Additional options (set names)
	./elasticsearch -Ecluster.name=my-cluster -Enode.name=my-node

FOR RPM BASED SYSTEMS
---------------------
1. rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
2. vi /etc/yum.repos.d/elasticsearch.repo and paste the following:

	[elasticsearch-5.x]
	name=Elasticsearch repository for 5.x packages
	baseurl=https://artifacts.elastic.co/packages/5.x/yum
	gpgcheck=1
	gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
	enabled=1
	autorefresh=1
	type=rpm-md
3. sudo yum install elasticsearch
4. update cluster name in /etc/elasticsearch/elasticsearch.yaml.
5. sudo chkconfig --add elasticsearch
6. service elasticsearch start
7. Add template, see ADD TEMPLATE section below.


Cluster Health
==============
GET /_cat/health?v
sbsatter@sbsatter ~ $ curl -XGET 'localhost:9200/_cat/health?v&pretty'
Response
epoch      timestamp cluster          status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1503292009 11:06:49  sbsatter-cluster green           1         1      0   0    0    0        0             0                  -                100.0%

Green, yellow, or red. Green means everything is good (cluster is fully functional), yellow means all data is available but some replicas are not yet allocated (cluster is fully functional), and red means some data is not available for whatever reason. Note that even if a cluster is red, it still is partially functional (i.e. it will continue to serve search requests from the available shards) but you will likely need to fix it ASAP since you have missing data.

Elasticsearch Command Pattern
=============================

<REST Verb> /<Index>/<Type>/<ID>


List of Nodes
=============
GET /_cat/nodes?v

Create/List indices
==================
GET /_cat/indices?v

PUT /customer?pretty
curl -XPUT 'localhost:9200/customer?pretty&pretty'

GET /_cat/indices?v
curl -XGET 'localhost:9200/_cat/indices?v&pretty'


Indexing and Querying Document
==============================
Indexing:
PUT /customer/external/1?pretty
{
  "name": "John Doe"
}

curl -XPUT 'localhost:9200/customer/external/1?pretty&pretty' -H 'Content-Type: application/json' -d'
{
  "name": "John Doe"
}
'

Querying:
curl -XGET 'localhost:9200/customer/external/1?pretty&pretty'

Deleting Indices
================
DELETE /customer?pretty
curl -XDELETE 'localhost:9200/customer?pretty&pretty'
curl -XGET 'localhost:9200/_cat/indices?v&pretty'

Note:
**ID is optional. Prefer to use POST instead of PUT when skipping the id.
**Using the same ID overwrites the existing.


Update Documents
================
curl -XPOST 'localhost:9200/customer/external/1/_update?pretty&pretty' -H 'Content-Type: application/json' -d'
{
  "doc": { "name": "Jane Doe", "age": 20 }
}
'

Also,
curl -XPOST 'localhost:9200/customer/external/1/_update?pretty&pretty' -H 'Content-Type: application/json' -d'
{
  "script": "ctx._source +=5"
}
'

Bulk Processing
===============
Efficient to run multiple queries

curl -XPOST 'localhost:9200/customer/external/_bulk?pretty&pretty' -H 'Content-Type: application/json' -d'
{"index":{"_id":"1"}}
{"name": "John Doe" }
{"index":{"_id":"2"}}
{"name": "Jane Doe" }'


curl -XPOST 'localhost:9200/customer/external/_bulk?pretty&pretty' -H 'Content-Type: application/json' -d'
{"update":{"_id":"1"}}
{"doc": { "name": "John Doe becomes Jane Doe" } }
{"delete":{"_id":"2"}}
'


Format date mapping
===================

curl -XPUT 'localhost:9200/sbsatter-logstash-2017.08.28' -H 'content-type: application/json' -d '{"mappings": {"_default_":{"properties":{"time":{"type":"date","format":"yyyy-MM-dd HH:mm:ss.SSS"}}}}}'

Add mapping
===========

curl -XPUT 'localhost:9200/sbsatter-logstash-2017.09.07?pretty' -H 'content-type: application/json' -d '
{
    "settings" : {
        "number_of_shards" : 5,
        "number_of_replicas" : 1
    },
    "mappings" : {
        "_default_" : {
            "properties" : {
                "time" : { "type" : "date", "format": "yyyy-MM-dd HH:mm:ss.SSSZ" },
                "message" : { "type": "string" },
                "number" : { "type": "integer" }
            }
        }
    }
}'


Get Mapping
===========

curl -XGET 'localhost:9200/_all/_mapping?pretty'
curl -XGET 'localhost:9200/_mapping?pretty'


Add template
============
sbsatter@sbsatter ~/Downloads/logstash-5.5.2 $ curl -XPUT 'localhost:9200/_template/sbsatter_logstash_template?pretty' -H 'content-type: application/json' -d '
{
    "template":"sbsatter-logstash-*",
    "settings":{
        "number_of_shards":1
    },
    "mappings":{
        "_default_":{
            "properties":{
                "time":{
                    "type":"date",
                    "format": "yyyy-MM-dd HH:mm:ss.SSSZ"
                }
            }
        },
        "BACKOFFICE":{
            "properties":{
                "time":{
                    "type":"date",
                    "format": "yyyy-MM-dd HH:mm:ss.SSSZ"
                }
            }
        }
    }
}'


template:
{
  "pay365temp" : {
    "order" : 0,
    "version" : 1,
    "template" : "pay365-*",
    "settings" : {
      "index" : {
        "refresh_interval" : "5s"
      }
    },
    "mappings" : {
      "_default_" : {
        "properties" : {
          "path" : {
            "type" : "keyword"
          },
          "remoteHost" : {
            "type" : "ip"
          },
          "latitude" : {
            "type" : "half_float"
          },
          "logData" : {
            "type" : "text"
          },
          "host" : {
            "type" : "keyword"
          },
          "lineNum" : {
            "type" : "integer"
          },
          "className" : {
            "type" : "text"
          },
          "location" : {
            "type" : "geo_point"
          },
          "time" : {
            "format" : "yyyy-MM-dd HH:mm:ss,SSSZ",
            "type" : "date"
          },
          "message" : {
            "type" : "text"
          },
          "longitude" : {
            "type" : "half_float"
          }
        }
      }
    },
    "aliases" : { }
  }
}


Reindex
=======
sbsatter@sbsatter ~/Downloads/logstash-5.5.2 $ curl -XPOST "localhost:9200/_reindex?pretty" -H 'content-type:application/json' -d '
{
  "source": {
    "index": "sbsatter-logstash-2017.09.09"
  },
  "dest": {
    "index": "sbsatter-logstash-2017.09.09-reindexed"
  }
}
'

## Filebeat
CONFIGURING FILEBEAT
====================

1. rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
2. vi /etc/yum.repos.d/filebeat.repo and paste
3. Set up to work with logstash.
	a. Open filebeat.yml in $path.home.
	b. comment, if not already done, the output to elasticsearch part.
	c. uncomment, if not already done, the output to logstash.
	d. configure port and host ip.
	e. test config by running: ./filebeat -configtest -e
	f. check permissions in case of error.


## kibana
Installation
============
1. rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
2. vi /etc/yum.repos.d/kibana.repo and paste
	[kibana-5.x]
	name=Kibana repository for 5.x packages
	baseurl=https://artifacts.elastic.co/packages/5.x/yum
	gpgcheck=1
	gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
	enabled=1
	autorefresh=1
	type=rpm-md
3. yum install kibana
	Resources
	+++++++++
	https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html - elasticsearch docs
	https://www.elastic.co/guide/en/logstash/current/index.html - logstash docs
	https://www.elastic.co/guide/en/kibana/current/index.html - kibana docs
	https://stackoverflow.com/questions/15694724/shards-and-replicas-in-elasticsearch - shards and replicas
	https://blog.insightdatascience.com/anatomy-of-an-elasticsearch-cluster-part-i-7ac9a13b05db - analogies and architectures


	Near Realtime (NRT)
	===================
	Elasticsearch is a near real time search platform. What this means is there is a slight latency (normally one second) from the time you index a document until the time it becomes searchable.

	Cluster
	=======
	A cluster is a collection of one or more nodes (servers) that together holds your entire data and provides federated indexing and search capabilities across all nodes.
	Default cluster is elasticsearch (created automatically).

	Node
	====
	A node is a single server that is part of your cluster, stores your data, and participates in the cluster’s indexing and search capabilities.
	name is important for administration purposes where you want to identify which servers in your network correspond to which nodes in your Elasticsearch cluster.

	Index
	=====
	An index is a collection of documents that have somewhat similar characteristics. For example, you can have an index for customer data, another index for a product catalog, and yet another index for order data.

	Type
	====
	Within an index, you can define one or more types. A type is a logical category/partition of your index whose semantics is completely up to you. In general, a type is defined for documents that have a set of common fields. For example, let’s assume you run a blogging platform and store all your data in a single index. In this index, you may define a type for user data, another type for blog data, and yet another type for comments data.

	Document
	========
	A document is a basic unit of information that can be indexed. For example, you can have a document for a single customer, another document for a single product, and yet another for a single order.

	Shard
	=====
	Elasticsearch provides the ability to subdivide your index into multiple pieces called shards. When you create an index, you can simply define the number of shards that you want. Each shard is in itself a fully-functional and independent "index" that can be hosted on any node in the cluster.

	Sharding is important for two primary reasons:

	It allows you to horizontally split/scale your content volume
	It allows you to distribute and parallelize operations across shards (potentially on multiple nodes) thus increasing performance/throughput

	Replica
	=======
	Elasticsearch allows you to make one or more copies of your index’s shards into what are called replica shards, or replicas for short.



	Note
	====
	Each Elasticsearch shard is a Lucene index. There is a maximum number of documents you can have in a single Lucene index. As of LUCENE-5843, the limit is 2,147,483,519 (= Integer.MAX_VALUE - 128) documents. You can monitor shard sizes using the _cat/shards API.


	Download And Installation
	=========================
	1. check jdk version (1.8.0_131 preferable)
	2. Download from http://www.elastic.co/downloads
	3. unzip tar file
	tar -xvf elasticsearch-5.5.2.tar.gz
	4. Run as ./elasticsearch
	Additional options (set names)
	./elasticsearch -Ecluster.name=my-cluster -Enode.name=my-node

	FOR RPM BASED SYSTEMS
	---------------------
	1. rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
	2. vi /etc/yum.repos.d/elasticsearch.repo and paste the following:

	[elasticsearch-5.x]
	name=Elasticsearch repository for 5.x packages
	baseurl=https://artifacts.elastic.co/packages/5.x/yum
	gpgcheck=1
	gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
	enabled=1
	autorefresh=1
	type=rpm-md
	3. sudo yum install elasticsearch
	4. update cluster name in /etc/elasticsearch/elasticsearch.yaml.
	5. sudo chkconfig --add elasticsearch
	6. service elasticsearch start
	7. Add template, see ADD TEMPLATE section below.


	Cluster Health
	==============
	GET /_cat/health?v
	sbsatter@sbsatter ~ $ curl -XGET 'localhost:9200/_cat/health?v&pretty'
	Response
	epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
	1503292009 11:06:49 sbsatter-cluster green 1 1 0 0 0 0 0 0 - 100.0%

	Green, yellow, or red. Green means everything is good (cluster is fully functional), yellow means all data is available but some replicas are not yet allocated (cluster is fully functional), and red means some data is not available for whatever reason. Note that even if a cluster is red, it still is partially functional (i.e. it will continue to serve search requests from the available shards) but you will likely need to fix it ASAP since you have missing data.

	Elasticsearch Command Pattern
	=============================

	<REST Verb> /<Index>/<Type>/<ID>


	List of Nodes
	=============
	GET /_cat/nodes?v

	Create/List indices
	==================
	GET /_cat/indices?v

	PUT /customer?pretty
	curl -XPUT 'localhost:9200/customer?pretty&pretty'

	GET /_cat/indices?v
	curl -XGET 'localhost:9200/_cat/indices?v&pretty'


	Indexing and Querying Document
	==============================
	Indexing:
	PUT /customer/external/1?pretty
	{
	"name": "John Doe"
	}

	curl -XPUT 'localhost:9200/customer/external/1?pretty&pretty' -H 'Content-Type: application/json' -d'
	{
	"name": "John Doe"
	}
	'

	Querying:
	curl -XGET 'localhost:9200/customer/external/1?pretty&pretty'

	Deleting Indices
	================
	DELETE /customer?pretty
	curl -XDELETE 'localhost:9200/customer?pretty&pretty'
	curl -XGET 'localhost:9200/_cat/indices?v&pretty'

	Note:
	**ID is optional. Prefer to use POST instead of PUT when skipping the id.
	**Using the same ID overwrites the existing.


	Update Documents
	================
	curl -XPOST 'localhost:9200/customer/external/1/_update?pretty&pretty' -H 'Content-Type: application/json' -d'
	{
	"doc": { "name": "Jane Doe", "age": 20 }
	}
	'

	Also,
	curl -XPOST 'localhost:9200/customer/external/1/_update?pretty&pretty' -H 'Content-Type: application/json' -d'
	{
	"script": "ctx._source +=5"
	}
	'

	Bulk Processing
	===============
	Efficient to run multiple queries

	curl -XPOST 'localhost:9200/customer/external/_bulk?pretty&pretty' -H 'Content-Type: application/json' -d'
	{"index":{"_id":"1"}}
	{"name": "John Doe" }
	{"index":{"_id":"2"}}
	{"name": "Jane Doe" }'


	curl -XPOST 'localhost:9200/customer/external/_bulk?pretty&pretty' -H 'Content-Type: application/json' -d'
	{"update":{"_id":"1"}}
	{"doc": { "name": "John Doe becomes Jane Doe" } }
	{"delete":{"_id":"2"}}
	'


	Format date mapping
	===================

	curl -XPUT 'localhost:9200/sbsatter-logstash-2017.08.28' -H 'content-type: application/json' -d '{"mappings": {"_default_":{"properties":{"time":{"type":"date","format":"yyyy-MM-dd HH:mm:ss.SSS"}}}}}'

	Add mapping
	===========

	curl -XPUT 'localhost:9200/sbsatter-logstash-2017.09.07?pretty' -H 'content-type: application/json' -d '
	{
	"settings" : {
	"number_of_shards" : 5,
	"number_of_replicas" : 1
	},
	"mappings" : {
	"_default_" : {
	"properties" : {
	"time" : { "type" : "date", "format": "yyyy-MM-dd HH:mm:ss.SSSZ" },
	"message" : { "type": "string" },
	"number" : { "type": "integer" }
	}
	}
	}
	}'


	Get Mapping
	===========

	curl -XGET 'localhost:9200/_all/_mapping?pretty'
	curl -XGET 'localhost:9200/_mapping?pretty'



	Add template
	============
	sbsatter@sbsatter ~/Downloads/logstash-5.5.2 $ curl -XPUT 'localhost:9200/_template/sbsatter_logstash_template?pretty' -H 'content-type: application/json' -d '
	{
	"template":"sbsatter-logstash-*",
	"settings":{
	"number_of_shards":1
	},
	"mappings":{
	"_default_":{
	"properties":{
	"time":{
	"type":"date",
	"format": "yyyy-MM-dd HH:mm:ss.SSSZ"
	}
	}
	},
	"BACKOFFICE":{
	"properties":{
	"time":{
	"type":"date",
	"format": "yyyy-MM-dd HH:mm:ss.SSSZ"
	}
	}
	}
	}
	}'


	template:
	{
	"pay365temp" : {
	"order" : 0,
	"version" : 1,
	"template" : "pay365-*",
	"settings" : {
	"index" : {
	"refresh_interval" : "5s"
	}
	},
	"mappings" : {
	"_default_" : {
	"properties" : {
	"path" : {
	"type" : "keyword"
	},
	"remoteHost" : {
	"type" : "ip"
	},
	"latitude" : {
	"type" : "half_float"
	},
	"logData" : {
	"type" : "text"
	},
	"host" : {
	"type" : "keyword"
	},
	"lineNum" : {
	"type" : "integer"
	},
	"className" : {
	"type" : "text"
	},
	"location" : {
	"type" : "geo_point"
	},
	"time" : {
	"format" : "yyyy-MM-dd HH:mm:ss,SSSZ",
	"type" : "date"
	},
	"message" : {
	"type" : "text"
	},
	"longitude" : {
	"type" : "half_float"
	}
	}
	}
	},
	"aliases" : { }
	}
	}


	Reindex
	=======
	sbsatter@sbsatter ~/Downloads/logstash-5.5.2 $ curl -XPOST "localhost:9200/_reindex?pretty" -H 'content-type:application/json' -d '
	{
	"source": {
	"index": "sbsatter-logstash-2017.09.09"
	},
	"dest": {
	"index": "sbsatter-logstash-2017.09.09-reindexed"
	}
	}
	'
	CONFIGURING FILEBEAT
	====================

	1. rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
	2. vi /etc/yum.repos.d/filebeat.repo and paste
	3. Set up to work with logstash.
	a. Open filebeat.yml in $path.home.
	b. comment, if not already done, the output to elasticsearch part.
	c. uncomment, if not already done, the output to logstash.
	d. configure port and host ip.
	e. test config by running: ./filebeat -configtest -e
	f. check permissions in case of error.
	Installation
	============
	1. rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
	2. vi /etc/yum.repos.d/kibana.repo and paste
	[kibana-5.x]
	name=Kibana repository for 5.x packages
	baseurl=https://artifacts.elastic.co/packages/5.x/yum
	gpgcheck=1
	gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
	enabled=1
	autorefresh=1
	type=rpm-md
	3. yum install kibana