Skip to content

Instantly share code, notes, and snippets.

@higee
Last active November 8, 2017 07:14
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save higee/ae762b124f4d90098a76bd5362ea8dd0 to your computer and use it in GitHub Desktop.
Save higee/ae762b124f4d90098a76bd5362ea8dd0 to your computer and use it in GitHub Desktop.
AWSKRUG Data Science (Big Data) 20170817 Presentation

Basic Settings


  • aws ec2
    • image : Amazon Linux AMI (ami-8663bae8)
    • instance type : t2 medium
    • security group(port) : 5601, 9200, 9300

Upgrade to Java 1.8


connect to ec2 instance

$ ssh -i "{}.pem" ec2-user@ec2-{IPv4 Public IP address}.{region}.compute.amazonaws.com

check java version

$ java -version

download java 1.8

$ cd /usr/lib/jvm
$ sudo wget -c --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/8u141-b15/336fa29ff2bb4ef291e347e091f7f4a7/jdk-8u141-linux-x64.tar.gz"
$ sudo tar -xzvf jdk-8u141-linux-x64.tar.gz

install java 1.8

$ sudo alternatives --install /usr/bin/java java /usr/lib/jvm/jdk1.8.0_141/bin/java 2

set java 1.8 as default

$ sudo alternatives --config java # select java 1.8

check java version one more time

$ java -version


Download and install elastic stack


create a directory

$ cd ~
$ mkdir AWSKRUG
$ cd AWSKRUG

download elastic stack

$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.5.0.tar.gz
$ wget https://artifacts.elastic.co/downloads/logstash/logstash-5.5.0.tar.gz
$ wget https://artifacts.elastic.co/downloads/kibana/kibana-5.5.0-linux-x86_64.tar.gz
$ wget https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-5.5.0-linux-x86_64.tar.gz
$ wget https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-5.5.0-linux-x86_64.tar.gz

uncompress files

$ tar -xzvf elasticsearch-5.5.0.tar.gz
$ tar -xzvf logstash-5.5.0.tar.gz
$ tar -xzvf kibana-5.5.0-linux-x86_64.tar.gz
$ tar -xzvf filebeat-5.5.0-linux-x86_64.tar.gz
$ tar -xzvf metricbeat-5.5.0-linux-x86_64.tar.gz

remove gz files

$ rm *.gz


Elasticsearch and Kibana configuration


configure elasticsearch settings - increase max file descriptors click

$ sudo vim /etc/security/limits.conf

*        hard    nofile           65536
*        soft    nofile           65536

log out once and log in again so that change could be applied

$ sudo su
$ su - ec2-user
$ ulimit -Hn # have it checked

configure elasticsearch settings - increase virtual memory areas click

  • temporary
$ sudo sysctl -w vm.max_map_count=262144
$ sudo sysctl -a | grep vm.max_map_count # have it checked
  • permanent
$ sudo vim /etc/sysctl.conf
vm.max_map_count=262144 
$ reboot
$ ssh -i "{}.pem" ec2-user@ec2-{IPv4 Public IP address}.{region}.compute.amazonaws.com

configure elasticsearch settings - network

$ vim ~/AWSKRUG/elasticsearch/config/elasticsearch.yml

network.host: "{Public DNS}"
#example
network.host: "ec2-52-78-156-86.ap-northeast-2.compute.amazonaws.com."

check whether elsaticsearch is running

$ cd /home/ec2-user/AWSKRUG/elasticsearch-5.5.0
$ bin/elasticsearch -d # run in daemon
$ curl http://{IPv4 Public IP address}:9200/
# curl example
$ curl http://52.78.156.86:9200

configure kibana settings

$ cd /home/ec2-user/AWSKRUG/kibana-5.5.0-linux-x86_64/
$ vim config/kibana.yml
server.host: "{Public DNS}"
elasticsearch.url: "http://{IPv4 Public IP address}:9200"

run kibana

$ nohup bin/kibana &

check whether kibana is running

http://{IPv4 Public IP address}:5601


Metricbeat -> Elasticserach exercise


configure metricbeat settings

$ cd /home/ec2-user/AWSKRUG/metricbeat-5.5.0-linux-x86_64/
$ vim metricbeat.yml
output.elasticsearch:
  hosts: ["{IPv4 Public IP address}:9200"]
  
dashboards.enabled: true

run metricbeat

$ ./metricbeat -e


Logstash -> Elasticsearch exercise


download test data

$ mkdir /home/ec2-user/AWSKRUG/data
$ cd /home/ec2-user/AWSKRUG/data
$ wget https://gist.githubusercontent.com/higee/1e3c3137195cf14eb23dd827a55e9b1d/raw/388c7df111beded4d136465474fcd9bf7826b545/titanic.csv

configure logstash settings

$ cd /home/ec2-user/AWSKRUG/logstash-5.5.0
$ vim titanic.conf
input {
  file  {
    path => "/home/ec2-user/AWSKRUG/data/titanic.csv"
    start_position => "beginning"
  }
}

filter {
  csv {
    columns => ["passengerid", "name", "survived", "pclass", "sex", "age", "sibsp", "parch", "ticket", "fare", "embarked"]
    separator => ","
  }
}

output {
  elasticsearch {
    index => "titanic"
    document_type => "titanic"
    hosts => ["http://{IPv4 Public IP address}:9200"]
  }
  stdout {
    codec => rubydebug
  }
}

run logstash

$ bin/logstash -f titanic.conf


Filebeat -> Logstash -> Elasticsearch exercise


download log_generator.py

$ cd /home/ec2-user/AWSKRUG
$ wget https://gist.githubusercontent.com/higee/3d41efa803d7069f56ff089dfd3f7c2b/raw/eddfa64ba644d4c9512f95ed2df712eb9caaae0c/log_generator.py

configure elasticsearch index mapping

create an index

$ curl -XPUT '{IPv4 Public IP address}:9200/test_index'

check whether it's been successfully created

$ curl -GET '{IPv4 Public IP address}:9200/test_index'

put index mapping

$ curl -XPUT "http://{IPv4 Public IP address}:9200/test_index/_mapping/test_type" -H 'Content-Type: application/json' -d' { "properties": {"filename" : {"type" : "keyword"}, "lineno" : {"type" : "integer"}, "level" : {"type" : "keyword"}, "e_message" : {"type" : "text"}}}'

check index mapping

$ curl -XGET '{IPv4 Public IP address}:9200/test_index/_mapping/test_type?pretty'

configure logstash

$ cd logstash-5.5.0
$ vim log_generator.conf
input {
  beats {
    port => 5044
  }
}

filter {
  csv {
    separator => ","
      columns => [ "filename", "lineno", "level", "e_message"]
  }
}

output {
  elasticsearch {
    index => "{index name}"
    document_type => "{type name}"
    hosts => ["http://{IPv4 Public IP address}:9200"]
  }
  stdout {
    codec => rubydebug
  }
}

run logstash in background

$ nohup bin/logstash -f log_generator.conf &

configure filebeat

$ cd /home/ec2-user/AWSKRUG/filebeat-5.5.0-linux-x86_64
$ vim filebeat.yml
#=========================== Filebeat prospectors =============================

filebeat.prospectors:

# Each - is a prospector. Most options can be set at the prospector level, so
# you can use different prospectors for various configurations.
# Below are the prospector specific configurations.

- input_type: log

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /home/ec2-user/AWSKRUG/log/*.log

...
#================================ Outputs =====================================

# Configure what outputs to use when sending the data collected by the beat.
# Multiple outputs may be used.

#-------------------------- Elasticsearch output ------------------------------
#output.elasticsearch:
  # Array of hosts to connect to.
  # hosts: ["localhost:9200"]

  # Optional protocol and basic auth credentials.
  #protocol: "https"
  #username: "elastic"
  #password: "changeme"

#----------------------------- Logstash output --------------------------------
output.logstash:
  # The Logstash hosts
  hosts: ["localhost:5044"]

run filebeat in background

$ nohup ./filebeat -e -c filebeat.yml &

run log_generator.py to generate log data

$ python log_generator.py


source


higee.io/221059817963
higee.io/221074551633
higee.io/221063081083
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment