Skip to content

Instantly share code, notes, and snippets.

@grahamwhaley
Last active August 6, 2019 12:11
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save grahamwhaley/aa730e6bbd6a8ceab82129042b186467 to your computer and use it in GitHub Desktop.
Save grahamwhaley/aa730e6bbd6a8ceab82129042b186467 to your computer and use it in GitHub Desktop.
Configuring filebeat and logstash to pass raw JSON to elastic

Configuring filebeat and logstash to pass JSON to elastic

Over on Kata Contaiers we want to store some metrics results into Elasticsearch so we can have some nice views and analysis. Our results are generated as JSON, and we have trialled injecting them directly into Elastic using curl, and that worked OK. As Kata is under the OSF umbrella, we will likely end up using the existing ELK infrastructure. One requirement of that is that we route our JSON data through logstash. To do that from the build machines, the obvious choice is to use filebeat.

The flow

As a quick ASCII overview, the data flow goes something like this:

metrics-[JSON text]->filebeat->logstash->elastic[->kibana]

The fun part is that neither filebeat nor logstash normally expect to have a 'JSON file' flung at them and pass it on uncooked. Our JSON results are generally something like (a shortened version ....)

{
        "@timestamp" : 1537953398125,
        "env" : {
                "RuntimeVersion": "1.3.0-rc1",
        },
        "date" : {
                "ns": 1537953398129625483,
                "Date": "2018-09-26T09:16:38.134"
        },
        "test" : {
                "runtime": "kata-runtime",
                "testname": "cpu information"
        },
        "kata-env" :
        {
  "Meta": {
    "Version": "1.0.16"
  },
        "Results": [
                        {
                "instructions per cycle": {
                        "Result" : 1.69,
                        "Units"  : "insns per cycle"
                },
                "cycles": {
                        "Result" : 192423728747,
                        "Units"  : "cycles"
                },
                "instructions": {
                        "Result" : 324468996333,
                        "Units"  : "instructions"
                }
        }
        ]
}

Configuring

filebeat

First step then is to set up filebeat so we can talk to it. Normally filebeat will monitor a file or similar. We could have it monitor a directory or file and inject the results there to be picked up - but our default 'direct to Elastic' method is to curl the results directly to a socket:

	echo "$json" > $json_filename

	# If we have a JSON URL set up, post the results there as well
	if [[ $JSON_URL ]]; then
		echo "Posting results to [$JSON_URL]"
		curl -XPOST -H"Content-Type: application/json" "$JSON_URL" -d "@$json_filename"
	fi

Now, note we do place the results into a file there - but, sometimes that file is buried inside a VM instance that is doing the tests. Having a socket (probably on the host that is running the VM) to write the results to is a much more flexible structure.

Luckily for us filebeat has a socket plugin. That looks like it should suit us just fine. Here is the filebeat config file setup, trimmed to its bare, that accepts data on a socket and routes it to logstash.

######################## Filebeat Configuration ############################

filebeat.inputs:
#------------------------------ TCP input --------------------------------
- type: tcp
  enabled: true

  # The host and port to receive the new event
  host: "localhost:9000"

  # Maximum size in bytes of the message received over TCP
  max_message_size: 1MiB

#----------------------------- Logstash output ---------------------------------
output.logstash:
  # Boolean flag to enable or disable the output module.
  enabled: true

  # The Logstash hosts
  hosts: ["192.168.0.xxx:5044"]

  # The maximum number of events to bulk in a single Logstash request. The
  bulk_max_size: 1024

We shoud note here:

  • by default, filebeat reads single lines. It can be configured for multiline support if we needed
  • you probably want to add ssl security between your filebeat and your logstash - we have skipped that in this gist.

About that 'multiline' thing then. Well, our results are generated as nice human-ish readable JSON - that is, as multiline. We could configure the filebeat to handle that, and expect complete JSON or maybe even NULL delimited items, or we can remove all carriage returns from our JSON before we fire it at the socket. Here is a snippet to do just that, and nc's the result to the filebeat socket.

multi="$(cat cpu-information.json)"
single="$(sed 's/[\n\r]//g' <<< ${multi})"
nc localhost 9000 < single.json

logstash

Now we are getting the data to filebeat, we need to set up the logstash to capture that, process it mildly, and send it on to elasticsearch. Here are the configs.

First, we set up the filebeat input. We grab it off the socket, and then we use the json filter to change that string input JSON into an actual part of the JSON payload - that is the 'magic trick' to get our JSON stream passed over almost 'as is' to elasticsearch. As each of our tests can generate a different stream of JSON (that is, the basic shape can vary), I suspect we may end up putting a small JSON wrapper around each set of data, probably using the top level testname item from the stream as the wrapper. The right place to do that is probably in the metrics JSON generation itself. We can hold any further logstash munging as an option for later if we find we need to tweak something or have historical data to fix up etc.

# /etc/logstash/conf.d/02-beats-input.conf
input {
  beats {
    port => 5044
    # Add your ssl security configs here!
  }
}

# Use the json filter to get the json message input, and inject it into the
# json stream.... as JSON.
filter {
  json {
    source => "message"
  }
}

Here is the pretty simple setup to then inject the resulting JSON into a set index on the elasticsearch host. We may want to use some fancy index name cycling etc. And, we may find that we need to store different metrics results into different indexes later on. This work as a test bed with a single index for now.

# /etc/logstash/conf.d/30-elasticsearch-output.conf

output {
  elasticsearch {
    hosts => ["192.168.0.yyy:9200"]
    sniffing => true
    manage_template => false
    index => "beattest"
  }
}

Result

If we then go peek into the elastic with a http://192.168.0.yyy:9200/beattest/_search?q=qemu, we can see the JSON results. Here is a fragment:

"Results":[
  {"instructions":
      {"Units":"instructions","Result":324468996333},
      "instructions per cycle":{"Units":"insns per cycle","Result":1.69},
      "cycles":{"Units":"cycles","Result":192423728747}
    }
  ],
  "env":{"ProxyVersion":"1.3.0-rc1-981fef4774ba15cf94a3a9013629d0ab60668348","ShimVersion":" kata-shim version 1.3.0-rc1-9b2891cfb153967fa4a65e44b2928255c889f643",

Yay, our JSON results has been copied (and, right now I mean copied - I think I can still see the data in the original message field), into the JSON source. Now we should be able to locate and process that in Kibana.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment