meilinger/logstash-filebeat-5-minutes.md

## logstash-filebeat-5-minutes.md

      
    Raw
  

              logstash-filebeat-5-minutes.md
            
          
    Logstash and Filebeat in 5 minutes

What/Why?


Filebeat is a log shipper, capture files and send to Logstash for processing and eventual indexing in Elasticsearch
Logstash is a heavy swiss army knife when it comes to log capture/processing
Centralized logging, necessarily for deployments with > 1 server
Super-easy to get setup, a little trickier to configure
Captured data is easy to visualize with Kibana
Wny not just Logstash (ELK is so hot right now)?

Logstash is a heavyweight compared to Filebeat, prohibitive to running a swarm of tiny server instances
ELK is definitely still part of the stack, but we're adding "beats" to the mix => BELK


Overview

Filebeat capture and ship file logs --> Logstash parse logs into documents --> Elasticsearch store/index documents --> Kibana visualize/aggregate
How?


Use the Bitnami ELK ami for no-brainer ELK setup => https://bitnami.com/stack/elk
Install filebeat on each server via:

Manual steps: https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-getting-started.html
Ansible role:

via either GeerlingGuy https://github.com/geerlingguy/ansible-role-filebeat (STL native, I owe this dude some beers)
OR quick-n-dirty ansible role @ https://github.com/FireCARES/firecares-ansible/tree/public/roles/filebeat


You'll need to manually-load the filebeat template into ES before starting filebeat => https://www.elastic.co/guide/en/beats/filebeat/1.2/filebeat-template.html#load-template-shell

The tough parts

Getting filebeat and ELK setup was a breeze, but configuring Logstash to process logs correctly was more of a pain...enter GROK and logstash.conf
Logstash.conf

logstash.conf has 3 sections -- input / filter / output, simple enough, right?
Input section

In this case, the "input" section of the logstash.conf has a port open for Filebeat using the lumberjack protocol (any beat type should be able to connect):
input
{
    beats
    {
        ssl => false
        port => 5043
    }
}

Filter

This is where things get tricky.  "Filter" does the log parsing, primarily using "GROK" patterns.

GROK is the method that Logstash uses to parse log file entires using a set of patterns into a JSON-like structure, otherwise all logs coming will be stored as a "message" blob that really isn't too useful
Basically, grok patterns are based on regular expressions which have a pretty high learning curve to begin with

Tons of built-in patterns => https://github.com/logstash-plugins/logstash-patterns-core/tree/master/patterns
Debug your own patterns using a GROK debugger

https://grokdebug.herokuapp.com/
http://grokconstructor.appspot.com/do/match <- I liked this one a little better, as the herokuapp-hosted GROK debugger above would sometimes be unavailable


filter
{
    if [type] == "nginx_error" {
        grok {
            match => { "message" => "%{DATESTAMP:timestamp} \[%{LOGLEVEL:severity}\] %{POSINT:pid}#%{NUMBER}: %{GREEDYDATA:errormessage}(?:, client: %{IPORHOST:client})(?:, server: %{IPORHOST:server})(?:, request: %{QS:request})(?:, host: %{QS:host})(?:, referrer: \"%{URI:referrer}\")" }
        }
    }

    # Using a custom nginx log format that also includes the request duration and X-Forwarded-For http header as "end_user_ip"
    if [type] == "nginx_access" {
        grok {
            match => { "message" => "%{COMBINEDAPACHELOG}+ %{NUMBER:request_length} %{NUMBER:request_duration} (%{IPV4:end_user_ip}|-)" }
        }

        geoip {
            source => "end_user_ip"
        }

        mutate {
            convert => {
                "request_duration" => "float"
            }
        }
    }
}

Output

Pretty simple.  NOTE: you can specify the "beat" @metadata parameter via the "index" in your filebeat configuration, making things like separating dev/prod logs into separate instances easy
output
{

    elasticsearch
    {
        hosts => ["127.0.0.1:9200"]
        sniffing => true
        manage_template => true
        index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
        document_type => "%{[@metadata][type]}"
    }
}

What's next...

Multiline patterns are the way to go when capturing exception information and stack traces,
Another similar system, Metricbeat, looks to be an awesome complement to Filebeat and an alternative to CloudWatch when it comes to system-level metrics, personally, I'm going to dig into this next as the granularity of metrics for each application/system is pretty extensive via Metricbeat's modules.