Created
December 9, 2018 04:14
-
-
Save sbsatter/1bc13992df35d8cc2edb9d5bd54214a6 to your computer and use it in GitHub Desktop.
Basic guides, steps to setup, initialize and understand elasticsearch, logstash, Filebeat and Kibana.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Resources | |
+++++++++ | |
https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html - elasticsearch docs | |
https://www.elastic.co/guide/en/logstash/current/index.html - logstash docs | |
https://www.elastic.co/guide/en/kibana/current/index.html - kibana docs | |
https://stackoverflow.com/questions/15694724/shards-and-replicas-in-elasticsearch - shards and replicas | |
https://blog.insightdatascience.com/anatomy-of-an-elasticsearch-cluster-part-i-7ac9a13b05db - analogies and architectures | |
Near Realtime (NRT) | |
=================== | |
Elasticsearch is a near real time search platform. What this means is there is a slight latency (normally one second) from the time you index a document until the time it becomes searchable. | |
Cluster | |
======= | |
A cluster is a collection of one or more nodes (servers) that together holds your entire data and provides federated indexing and search capabilities across all nodes. | |
Default cluster is elasticsearch (created automatically). | |
Node | |
==== | |
A node is a single server that is part of your cluster, stores your data, and participates in the cluster’s indexing and search capabilities. | |
name is important for administration purposes where you want to identify which servers in your network correspond to which nodes in your Elasticsearch cluster. | |
Index | |
===== | |
An index is a collection of documents that have somewhat similar characteristics. For example, you can have an index for customer data, another index for a product catalog, and yet another index for order data. | |
Type | |
==== | |
Within an index, you can define one or more types. A type is a logical category/partition of your index whose semantics is completely up to you. In general, a type is defined for documents that have a set of common fields. For example, let’s assume you run a blogging platform and store all your data in a single index. In this index, you may define a type for user data, another type for blog data, and yet another type for comments data. | |
Document | |
======== | |
A document is a basic unit of information that can be indexed. For example, you can have a document for a single customer, another document for a single product, and yet another for a single order. | |
Shard | |
===== | |
Elasticsearch provides the ability to subdivide your index into multiple pieces called shards. When you create an index, you can simply define the number of shards that you want. Each shard is in itself a fully-functional and independent "index" that can be hosted on any node in the cluster. | |
Sharding is important for two primary reasons: | |
It allows you to horizontally split/scale your content volume | |
It allows you to distribute and parallelize operations across shards (potentially on multiple nodes) thus increasing performance/throughput | |
Replica | |
======= | |
Elasticsearch allows you to make one or more copies of your index’s shards into what are called replica shards, or replicas for short. | |
Note | |
==== | |
Each Elasticsearch shard is a Lucene index. There is a maximum number of documents you can have in a single Lucene index. As of LUCENE-5843, the limit is 2,147,483,519 (= Integer.MAX_VALUE - 128) documents. You can monitor shard sizes using the _cat/shards API. | |
Download And Installation | |
========================= | |
1. check jdk version (1.8.0_131 preferable) | |
2. Download from http://www.elastic.co/downloads | |
3. unzip tar file | |
tar -xvf elasticsearch-5.5.2.tar.gz | |
4. Run as ./elasticsearch | |
Additional options (set names) | |
./elasticsearch -Ecluster.name=my-cluster -Enode.name=my-node | |
FOR RPM BASED SYSTEMS | |
--------------------- | |
1. rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch | |
2. vi /etc/yum.repos.d/elasticsearch.repo and paste the following: | |
[elasticsearch-5.x] | |
name=Elasticsearch repository for 5.x packages | |
baseurl=https://artifacts.elastic.co/packages/5.x/yum | |
gpgcheck=1 | |
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch | |
enabled=1 | |
autorefresh=1 | |
type=rpm-md | |
3. sudo yum install elasticsearch | |
4. update cluster name in /etc/elasticsearch/elasticsearch.yaml. | |
5. sudo chkconfig --add elasticsearch | |
6. service elasticsearch start | |
7. Add template, see ADD TEMPLATE section below. | |
Cluster Health | |
============== | |
GET /_cat/health?v | |
sbsatter@sbsatter ~ $ curl -XGET 'localhost:9200/_cat/health?v&pretty' | |
Response | |
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent | |
1503292009 11:06:49 sbsatter-cluster green 1 1 0 0 0 0 0 0 - 100.0% | |
Green, yellow, or red. Green means everything is good (cluster is fully functional), yellow means all data is available but some replicas are not yet allocated (cluster is fully functional), and red means some data is not available for whatever reason. Note that even if a cluster is red, it still is partially functional (i.e. it will continue to serve search requests from the available shards) but you will likely need to fix it ASAP since you have missing data. | |
Elasticsearch Command Pattern | |
============================= | |
<REST Verb> /<Index>/<Type>/<ID> | |
List of Nodes | |
============= | |
GET /_cat/nodes?v | |
Create/List indices | |
================== | |
GET /_cat/indices?v | |
PUT /customer?pretty | |
curl -XPUT 'localhost:9200/customer?pretty&pretty' | |
GET /_cat/indices?v | |
curl -XGET 'localhost:9200/_cat/indices?v&pretty' | |
Indexing and Querying Document | |
============================== | |
Indexing: | |
PUT /customer/external/1?pretty | |
{ | |
"name": "John Doe" | |
} | |
curl -XPUT 'localhost:9200/customer/external/1?pretty&pretty' -H 'Content-Type: application/json' -d' | |
{ | |
"name": "John Doe" | |
} | |
' | |
Querying: | |
curl -XGET 'localhost:9200/customer/external/1?pretty&pretty' | |
Deleting Indices | |
================ | |
DELETE /customer?pretty | |
curl -XDELETE 'localhost:9200/customer?pretty&pretty' | |
curl -XGET 'localhost:9200/_cat/indices?v&pretty' | |
Note: | |
**ID is optional. Prefer to use POST instead of PUT when skipping the id. | |
**Using the same ID overwrites the existing. | |
Update Documents | |
================ | |
curl -XPOST 'localhost:9200/customer/external/1/_update?pretty&pretty' -H 'Content-Type: application/json' -d' | |
{ | |
"doc": { "name": "Jane Doe", "age": 20 } | |
} | |
' | |
Also, | |
curl -XPOST 'localhost:9200/customer/external/1/_update?pretty&pretty' -H 'Content-Type: application/json' -d' | |
{ | |
"script": "ctx._source +=5" | |
} | |
' | |
Bulk Processing | |
=============== | |
Efficient to run multiple queries | |
curl -XPOST 'localhost:9200/customer/external/_bulk?pretty&pretty' -H 'Content-Type: application/json' -d' | |
{"index":{"_id":"1"}} | |
{"name": "John Doe" } | |
{"index":{"_id":"2"}} | |
{"name": "Jane Doe" }' | |
curl -XPOST 'localhost:9200/customer/external/_bulk?pretty&pretty' -H 'Content-Type: application/json' -d' | |
{"update":{"_id":"1"}} | |
{"doc": { "name": "John Doe becomes Jane Doe" } } | |
{"delete":{"_id":"2"}} | |
' | |
Format date mapping | |
=================== | |
curl -XPUT 'localhost:9200/sbsatter-logstash-2017.08.28' -H 'content-type: application/json' -d '{"mappings": {"_default_":{"properties":{"time":{"type":"date","format":"yyyy-MM-dd HH:mm:ss.SSS"}}}}}' | |
Add mapping | |
=========== | |
curl -XPUT 'localhost:9200/sbsatter-logstash-2017.09.07?pretty' -H 'content-type: application/json' -d ' | |
{ | |
"settings" : { | |
"number_of_shards" : 5, | |
"number_of_replicas" : 1 | |
}, | |
"mappings" : { | |
"_default_" : { | |
"properties" : { | |
"time" : { "type" : "date", "format": "yyyy-MM-dd HH:mm:ss.SSSZ" }, | |
"message" : { "type": "string" }, | |
"number" : { "type": "integer" } | |
} | |
} | |
} | |
}' | |
Get Mapping | |
=========== | |
curl -XGET 'localhost:9200/_all/_mapping?pretty' | |
curl -XGET 'localhost:9200/_mapping?pretty' | |
Add template | |
============ | |
sbsatter@sbsatter ~/Downloads/logstash-5.5.2 $ curl -XPUT 'localhost:9200/_template/sbsatter_logstash_template?pretty' -H 'content-type: application/json' -d ' | |
{ | |
"template":"sbsatter-logstash-*", | |
"settings":{ | |
"number_of_shards":1 | |
}, | |
"mappings":{ | |
"_default_":{ | |
"properties":{ | |
"time":{ | |
"type":"date", | |
"format": "yyyy-MM-dd HH:mm:ss.SSSZ" | |
} | |
} | |
}, | |
"BACKOFFICE":{ | |
"properties":{ | |
"time":{ | |
"type":"date", | |
"format": "yyyy-MM-dd HH:mm:ss.SSSZ" | |
} | |
} | |
} | |
} | |
}' | |
template: | |
{ | |
"pay365temp" : { | |
"order" : 0, | |
"version" : 1, | |
"template" : "pay365-*", | |
"settings" : { | |
"index" : { | |
"refresh_interval" : "5s" | |
} | |
}, | |
"mappings" : { | |
"_default_" : { | |
"properties" : { | |
"path" : { | |
"type" : "keyword" | |
}, | |
"remoteHost" : { | |
"type" : "ip" | |
}, | |
"latitude" : { | |
"type" : "half_float" | |
}, | |
"logData" : { | |
"type" : "text" | |
}, | |
"host" : { | |
"type" : "keyword" | |
}, | |
"lineNum" : { | |
"type" : "integer" | |
}, | |
"className" : { | |
"type" : "text" | |
}, | |
"location" : { | |
"type" : "geo_point" | |
}, | |
"time" : { | |
"format" : "yyyy-MM-dd HH:mm:ss,SSSZ", | |
"type" : "date" | |
}, | |
"message" : { | |
"type" : "text" | |
}, | |
"longitude" : { | |
"type" : "half_float" | |
} | |
} | |
} | |
}, | |
"aliases" : { } | |
} | |
} | |
Reindex | |
======= | |
sbsatter@sbsatter ~/Downloads/logstash-5.5.2 $ curl -XPOST "localhost:9200/_reindex?pretty" -H 'content-type:application/json' -d ' | |
{ | |
"source": { | |
"index": "sbsatter-logstash-2017.09.09" | |
}, | |
"dest": { | |
"index": "sbsatter-logstash-2017.09.09-reindexed" | |
} | |
} | |
' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CONFIGURING FILEBEAT | |
==================== | |
1. rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch | |
2. vi /etc/yum.repos.d/filebeat.repo and paste | |
3. Set up to work with logstash. | |
a. Open filebeat.yml in $path.home. | |
b. comment, if not already done, the output to elasticsearch part. | |
c. uncomment, if not already done, the output to logstash. | |
d. configure port and host ip. | |
e. test config by running: ./filebeat -configtest -e | |
f. check permissions in case of error. | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Installation | |
============ | |
1. rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch | |
2. vi /etc/yum.repos.d/kibana.repo and paste | |
[kibana-5.x] | |
name=Kibana repository for 5.x packages | |
baseurl=https://artifacts.elastic.co/packages/5.x/yum | |
gpgcheck=1 | |
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch | |
enabled=1 | |
autorefresh=1 | |
type=rpm-md | |
3. yum install kibana |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment