grahamwhaley/json_through_logstash.md

## json_through_logstash.md

      
    Raw
  

              json_through_logstash.md
            
          
    Configuring filebeat and logstash to pass JSON to elastic

Over on Kata Contaiers we want to store some metrics
results into Elasticsearch so we can have some nice views and analysis. Our results are generated as JSON, and we have trialled
injecting them directly into Elastic using curl, and that worked OK.
As Kata is under the OSF umbrella, we will likely end up using the existing ELK infrastructure.
One requirement of that is that we route our JSON data through logstash. To do that from the build machines, the obvious
choice is to use filebeat.
The flow

As a quick ASCII overview, the data flow goes something like this:
metrics-[JSON text]->filebeat->logstash->elastic[->kibana]
The fun part is that neither filebeat nor logstash normally expect to have a 'JSON file' flung at them and pass it
on uncooked. Our JSON results are generally something like (a shortened version ....)
{
        "@timestamp" : 1537953398125,
        "env" : {
                "RuntimeVersion": "1.3.0-rc1",
        },
        "date" : {
                "ns": 1537953398129625483,
                "Date": "2018-09-26T09:16:38.134"
        },
        "test" : {
                "runtime": "kata-runtime",
                "testname": "cpu information"
        },
        "kata-env" :
        {
  "Meta": {
    "Version": "1.0.16"
  },
        "Results": [
                        {
                "instructions per cycle": {
                        "Result" : 1.69,
                        "Units"  : "insns per cycle"
                },
                "cycles": {
                        "Result" : 192423728747,
                        "Units"  : "cycles"
                },
                "instructions": {
                        "Result" : 324468996333,
                        "Units"  : "instructions"
                }
        }
        ]
}

Configuring

filebeat

First step then is to set up filebeat so we can talk to it.
Normally filebeat will monitor a file or similar. We could have it monitor a directory or file and inject the results
there to be picked up - but our default 'direct to Elastic' method is to curl the results
directly to a socket:
	echo "$json" > $json_filename

	# If we have a JSON URL set up, post the results there as well
	if [[ $JSON_URL ]]; then
		echo "Posting results to [$JSON_URL]"
		curl -XPOST -H"Content-Type: application/json" "$JSON_URL" -d "@$json_filename"
	fi
Now, note we do place the results into a file there - but, sometimes that file is buried inside a VM instance that is
doing the tests. Having a socket (probably on the host that is running the VM) to write the results to is a much more flexible structure.
Luckily for us filebeat has a socket plugin.
That looks like it should suit us just fine. Here is the filebeat config file setup, trimmed to its bare, that accepts
data on a socket and routes it to logstash.
######################## Filebeat Configuration ############################

filebeat.inputs:
#------------------------------ TCP input --------------------------------
- type: tcp
  enabled: true

  # The host and port to receive the new event
  host: "localhost:9000"

  # Maximum size in bytes of the message received over TCP
  max_message_size: 1MiB

#----------------------------- Logstash output ---------------------------------
output.logstash:
  # Boolean flag to enable or disable the output module.
  enabled: true

  # The Logstash hosts
  hosts: ["192.168.0.xxx:5044"]

  # The maximum number of events to bulk in a single Logstash request. The
  bulk_max_size: 1024
We shoud note here:

by default, filebeat reads single lines. It can be configured for multiline support if we needed
you probably want to add ssl security between your filebeat and your logstash - we have skipped that in this gist.

About that 'multiline' thing then. Well, our results are generated as nice human-ish readable JSON - that is, as multiline. We could configure the filebeat to handle that, and expect complete JSON or maybe even NULL delimited items, or we can remove all carriage returns from our JSON before we fire it at the socket. Here is a snippet to do just that, and nc's the result to the filebeat socket.
multi="$(cat cpu-information.json)"
single="$(sed 's/[\n\r]//g' <<< ${multi})"
nc localhost 9000 < single.json
logstash

Now we are getting the data to filebeat, we need to set up the logstash to capture that, process it mildly, and send
it on to elasticsearch. Here are the configs.
First, we set up the filebeat input. We grab it off the socket, and then we use the json filter to change
that string input JSON into an actual part of the JSON payload - that is the 'magic trick' to get our JSON stream
passed over almost 'as is' to elasticsearch.
As each of our tests can generate a different stream of JSON (that is, the basic shape can vary), I suspect we may end
up putting a small JSON wrapper around each set of data, probably using the top level testname item from the stream
as the wrapper. The right place to do that is probably in the metrics JSON generation itself. We can hold any further logstash munging as an option for later if we find we need to tweak something or have historical data to fix up etc.
# /etc/logstash/conf.d/02-beats-input.conf
input {
  beats {
    port => 5044
    # Add your ssl security configs here!
  }
}

# Use the json filter to get the json message input, and inject it into the
# json stream.... as JSON.
filter {
  json {
    source => "message"
  }
}
Here is the pretty simple setup to then inject the resulting JSON into a set index on the elasticsearch host.
We may want to use some fancy index name cycling etc. And, we may find that we need to store different metrics results
into different indexes later on. This work as a test bed with a single index for now.
# /etc/logstash/conf.d/30-elasticsearch-output.conf

output {
  elasticsearch {
    hosts => ["192.168.0.yyy:9200"]
    sniffing => true
    manage_template => false
    index => "beattest"
  }
}
Result

If we then go peek into the elastic with a http://192.168.0.yyy:9200/beattest/_search?q=qemu, we can see the JSON results. Here is a fragment:
"Results":[
  {"instructions":
      {"Units":"instructions","Result":324468996333},
      "instructions per cycle":{"Units":"insns per cycle","Result":1.69},
      "cycles":{"Units":"cycles","Result":192423728747}
    }
  ],
  "env":{"ProxyVersion":"1.3.0-rc1-981fef4774ba15cf94a3a9013629d0ab60668348","ShimVersion":" kata-shim version 1.3.0-rc1-9b2891cfb153967fa4a65e44b2928255c889f643",
Yay, our JSON results has been copied (and, right now I mean copied - I think I can still see the data in the original message field), into the JSON source. Now we should be able to locate and process that in Kibana.