Skip to content

Instantly share code, notes, and snippets.

@sbsatter
Created December 9, 2018 04:14
Show Gist options
  • Save sbsatter/1bc13992df35d8cc2edb9d5bd54214a6 to your computer and use it in GitHub Desktop.
Save sbsatter/1bc13992df35d8cc2edb9d5bd54214a6 to your computer and use it in GitHub Desktop.
Basic guides, steps to setup, initialize and understand elasticsearch, logstash, Filebeat and Kibana.
Resources
+++++++++
https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html - elasticsearch docs
https://www.elastic.co/guide/en/logstash/current/index.html - logstash docs
https://www.elastic.co/guide/en/kibana/current/index.html - kibana docs
https://stackoverflow.com/questions/15694724/shards-and-replicas-in-elasticsearch - shards and replicas
https://blog.insightdatascience.com/anatomy-of-an-elasticsearch-cluster-part-i-7ac9a13b05db - analogies and architectures
Near Realtime (NRT)
===================
Elasticsearch is a near real time search platform. What this means is there is a slight latency (normally one second) from the time you index a document until the time it becomes searchable.
Cluster
=======
A cluster is a collection of one or more nodes (servers) that together holds your entire data and provides federated indexing and search capabilities across all nodes.
Default cluster is elasticsearch (created automatically).
Node
====
A node is a single server that is part of your cluster, stores your data, and participates in the cluster’s indexing and search capabilities.
name is important for administration purposes where you want to identify which servers in your network correspond to which nodes in your Elasticsearch cluster.
Index
=====
An index is a collection of documents that have somewhat similar characteristics. For example, you can have an index for customer data, another index for a product catalog, and yet another index for order data.
Type
====
Within an index, you can define one or more types. A type is a logical category/partition of your index whose semantics is completely up to you. In general, a type is defined for documents that have a set of common fields. For example, let’s assume you run a blogging platform and store all your data in a single index. In this index, you may define a type for user data, another type for blog data, and yet another type for comments data.
Document
========
A document is a basic unit of information that can be indexed. For example, you can have a document for a single customer, another document for a single product, and yet another for a single order.
Shard
=====
Elasticsearch provides the ability to subdivide your index into multiple pieces called shards. When you create an index, you can simply define the number of shards that you want. Each shard is in itself a fully-functional and independent "index" that can be hosted on any node in the cluster.
Sharding is important for two primary reasons:
It allows you to horizontally split/scale your content volume
It allows you to distribute and parallelize operations across shards (potentially on multiple nodes) thus increasing performance/throughput
Replica
=======
Elasticsearch allows you to make one or more copies of your index’s shards into what are called replica shards, or replicas for short.
Note
====
Each Elasticsearch shard is a Lucene index. There is a maximum number of documents you can have in a single Lucene index. As of LUCENE-5843, the limit is 2,147,483,519 (= Integer.MAX_VALUE - 128) documents. You can monitor shard sizes using the _cat/shards API.
Download And Installation
=========================
1. check jdk version (1.8.0_131 preferable)
2. Download from http://www.elastic.co/downloads
3. unzip tar file
tar -xvf elasticsearch-5.5.2.tar.gz
4. Run as ./elasticsearch
Additional options (set names)
./elasticsearch -Ecluster.name=my-cluster -Enode.name=my-node
FOR RPM BASED SYSTEMS
---------------------
1. rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
2. vi /etc/yum.repos.d/elasticsearch.repo and paste the following:
[elasticsearch-5.x]
name=Elasticsearch repository for 5.x packages
baseurl=https://artifacts.elastic.co/packages/5.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
3. sudo yum install elasticsearch
4. update cluster name in /etc/elasticsearch/elasticsearch.yaml.
5. sudo chkconfig --add elasticsearch
6. service elasticsearch start
7. Add template, see ADD TEMPLATE section below.
Cluster Health
==============
GET /_cat/health?v
sbsatter@sbsatter ~ $ curl -XGET 'localhost:9200/_cat/health?v&pretty'
Response
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1503292009 11:06:49 sbsatter-cluster green 1 1 0 0 0 0 0 0 - 100.0%
Green, yellow, or red. Green means everything is good (cluster is fully functional), yellow means all data is available but some replicas are not yet allocated (cluster is fully functional), and red means some data is not available for whatever reason. Note that even if a cluster is red, it still is partially functional (i.e. it will continue to serve search requests from the available shards) but you will likely need to fix it ASAP since you have missing data.
Elasticsearch Command Pattern
=============================
<REST Verb> /<Index>/<Type>/<ID>
List of Nodes
=============
GET /_cat/nodes?v
Create/List indices
==================
GET /_cat/indices?v
PUT /customer?pretty
curl -XPUT 'localhost:9200/customer?pretty&pretty'
GET /_cat/indices?v
curl -XGET 'localhost:9200/_cat/indices?v&pretty'
Indexing and Querying Document
==============================
Indexing:
PUT /customer/external/1?pretty
{
"name": "John Doe"
}
curl -XPUT 'localhost:9200/customer/external/1?pretty&pretty' -H 'Content-Type: application/json' -d'
{
"name": "John Doe"
}
'
Querying:
curl -XGET 'localhost:9200/customer/external/1?pretty&pretty'
Deleting Indices
================
DELETE /customer?pretty
curl -XDELETE 'localhost:9200/customer?pretty&pretty'
curl -XGET 'localhost:9200/_cat/indices?v&pretty'
Note:
**ID is optional. Prefer to use POST instead of PUT when skipping the id.
**Using the same ID overwrites the existing.
Update Documents
================
curl -XPOST 'localhost:9200/customer/external/1/_update?pretty&pretty' -H 'Content-Type: application/json' -d'
{
"doc": { "name": "Jane Doe", "age": 20 }
}
'
Also,
curl -XPOST 'localhost:9200/customer/external/1/_update?pretty&pretty' -H 'Content-Type: application/json' -d'
{
"script": "ctx._source +=5"
}
'
Bulk Processing
===============
Efficient to run multiple queries
curl -XPOST 'localhost:9200/customer/external/_bulk?pretty&pretty' -H 'Content-Type: application/json' -d'
{"index":{"_id":"1"}}
{"name": "John Doe" }
{"index":{"_id":"2"}}
{"name": "Jane Doe" }'
curl -XPOST 'localhost:9200/customer/external/_bulk?pretty&pretty' -H 'Content-Type: application/json' -d'
{"update":{"_id":"1"}}
{"doc": { "name": "John Doe becomes Jane Doe" } }
{"delete":{"_id":"2"}}
'
Format date mapping
===================
curl -XPUT 'localhost:9200/sbsatter-logstash-2017.08.28' -H 'content-type: application/json' -d '{"mappings": {"_default_":{"properties":{"time":{"type":"date","format":"yyyy-MM-dd HH:mm:ss.SSS"}}}}}'
Add mapping
===========
curl -XPUT 'localhost:9200/sbsatter-logstash-2017.09.07?pretty' -H 'content-type: application/json' -d '
{
"settings" : {
"number_of_shards" : 5,
"number_of_replicas" : 1
},
"mappings" : {
"_default_" : {
"properties" : {
"time" : { "type" : "date", "format": "yyyy-MM-dd HH:mm:ss.SSSZ" },
"message" : { "type": "string" },
"number" : { "type": "integer" }
}
}
}
}'
Get Mapping
===========
curl -XGET 'localhost:9200/_all/_mapping?pretty'
curl -XGET 'localhost:9200/_mapping?pretty'
Add template
============
sbsatter@sbsatter ~/Downloads/logstash-5.5.2 $ curl -XPUT 'localhost:9200/_template/sbsatter_logstash_template?pretty' -H 'content-type: application/json' -d '
{
"template":"sbsatter-logstash-*",
"settings":{
"number_of_shards":1
},
"mappings":{
"_default_":{
"properties":{
"time":{
"type":"date",
"format": "yyyy-MM-dd HH:mm:ss.SSSZ"
}
}
},
"BACKOFFICE":{
"properties":{
"time":{
"type":"date",
"format": "yyyy-MM-dd HH:mm:ss.SSSZ"
}
}
}
}
}'
template:
{
"pay365temp" : {
"order" : 0,
"version" : 1,
"template" : "pay365-*",
"settings" : {
"index" : {
"refresh_interval" : "5s"
}
},
"mappings" : {
"_default_" : {
"properties" : {
"path" : {
"type" : "keyword"
},
"remoteHost" : {
"type" : "ip"
},
"latitude" : {
"type" : "half_float"
},
"logData" : {
"type" : "text"
},
"host" : {
"type" : "keyword"
},
"lineNum" : {
"type" : "integer"
},
"className" : {
"type" : "text"
},
"location" : {
"type" : "geo_point"
},
"time" : {
"format" : "yyyy-MM-dd HH:mm:ss,SSSZ",
"type" : "date"
},
"message" : {
"type" : "text"
},
"longitude" : {
"type" : "half_float"
}
}
}
},
"aliases" : { }
}
}
Reindex
=======
sbsatter@sbsatter ~/Downloads/logstash-5.5.2 $ curl -XPOST "localhost:9200/_reindex?pretty" -H 'content-type:application/json' -d '
{
"source": {
"index": "sbsatter-logstash-2017.09.09"
},
"dest": {
"index": "sbsatter-logstash-2017.09.09-reindexed"
}
}
'
CONFIGURING FILEBEAT
====================
1. rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
2. vi /etc/yum.repos.d/filebeat.repo and paste
3. Set up to work with logstash.
a. Open filebeat.yml in $path.home.
b. comment, if not already done, the output to elasticsearch part.
c. uncomment, if not already done, the output to logstash.
d. configure port and host ip.
e. test config by running: ./filebeat -configtest -e
f. check permissions in case of error.
Installation
============
1. rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
2. vi /etc/yum.repos.d/kibana.repo and paste
[kibana-5.x]
name=Kibana repository for 5.x packages
baseurl=https://artifacts.elastic.co/packages/5.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
3. yum install kibana
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment