Over on Kata Contaiers we want to store some metrics
results into Elasticsearch so we can have some nice views and analysis. Our results are generated as JSON, and we have trialled
injecting them directly into Elastic using curl
, and that worked OK.
As Kata is under the OSF umbrella, we will likely end up using the existing ELK infrastructure.
One requirement of that is that we route our JSON data through logstash
. To do that from the build machines, the obvious
choice is to use filebeat
.
As a quick ASCII overview, the data flow goes something like this:
metrics-[JSON text]->filebeat->logstash->elastic[->kibana]
The fun part is that neither filebeat nor logstash normally expect to have a 'JSON file' flung at them and pass it on uncooked. Our JSON results are generally something like (a shortened version ....)
{
"@timestamp" : 1537953398125,
"env" : {
"RuntimeVersion": "1.3.0-rc1",
},
"date" : {
"ns": 1537953398129625483,
"Date": "2018-09-26T09:16:38.134"
},
"test" : {
"runtime": "kata-runtime",
"testname": "cpu information"
},
"kata-env" :
{
"Meta": {
"Version": "1.0.16"
},
"Results": [
{
"instructions per cycle": {
"Result" : 1.69,
"Units" : "insns per cycle"
},
"cycles": {
"Result" : 192423728747,
"Units" : "cycles"
},
"instructions": {
"Result" : 324468996333,
"Units" : "instructions"
}
}
]
}
First step then is to set up filebeat so we can talk to it.
Normally filebeat will monitor a file or similar. We could have it monitor a directory or file and inject the results
there to be picked up - but our default 'direct to Elastic' method is to curl
the results
directly to a socket:
echo "$json" > $json_filename
# If we have a JSON URL set up, post the results there as well
if [[ $JSON_URL ]]; then
echo "Posting results to [$JSON_URL]"
curl -XPOST -H"Content-Type: application/json" "$JSON_URL" -d "@$json_filename"
fi
Now, note we do place the results into a file there - but, sometimes that file is buried inside a VM instance that is doing the tests. Having a socket (probably on the host that is running the VM) to write the results to is a much more flexible structure.
Luckily for us filebeat
has a socket plugin.
That looks like it should suit us just fine. Here is the filebeat config file setup, trimmed to its bare, that accepts
data on a socket and routes it to logstash.
######################## Filebeat Configuration ############################
filebeat.inputs:
#------------------------------ TCP input --------------------------------
- type: tcp
enabled: true
# The host and port to receive the new event
host: "localhost:9000"
# Maximum size in bytes of the message received over TCP
max_message_size: 1MiB
#----------------------------- Logstash output ---------------------------------
output.logstash:
# Boolean flag to enable or disable the output module.
enabled: true
# The Logstash hosts
hosts: ["192.168.0.xxx:5044"]
# The maximum number of events to bulk in a single Logstash request. The
bulk_max_size: 1024
We shoud note here:
- by default, filebeat reads single lines. It can be configured for multiline support if we needed
- you probably want to add ssl security between your filebeat and your logstash - we have skipped that in this gist.
About that 'multiline' thing then. Well, our results are generated as nice human-ish readable JSON - that is, as multiline. We could configure the filebeat to handle that, and expect complete JSON or maybe even NULL delimited items, or we can remove all carriage returns from our JSON before we fire it at the socket. Here is a snippet to do just that, and nc
's the result to the filebeat socket.
multi="$(cat cpu-information.json)"
single="$(sed 's/[\n\r]//g' <<< ${multi})"
nc localhost 9000 < single.json
Now we are getting the data to filebeat, we need to set up the logstash to capture that, process it mildly, and send it on to elasticsearch. Here are the configs.
First, we set up the filebeat
input. We grab it off the socket, and then we use the json
filter to change
that string input JSON into an actual part of the JSON payload - that is the 'magic trick' to get our JSON stream
passed over almost 'as is' to elasticsearch.
As each of our tests can generate a different stream of JSON (that is, the basic shape can vary), I suspect we may end
up putting a small JSON wrapper around each set of data, probably using the top level testname
item from the stream
as the wrapper. The right place to do that is probably in the metrics JSON generation itself. We can hold any further logstash munging as an option for later if we find we need to tweak something or have historical data to fix up etc.
# /etc/logstash/conf.d/02-beats-input.conf
input {
beats {
port => 5044
# Add your ssl security configs here!
}
}
# Use the json filter to get the json message input, and inject it into the
# json stream.... as JSON.
filter {
json {
source => "message"
}
}
Here is the pretty simple setup to then inject the resulting JSON into a set index on the elasticsearch host. We may want to use some fancy index name cycling etc. And, we may find that we need to store different metrics results into different indexes later on. This work as a test bed with a single index for now.
# /etc/logstash/conf.d/30-elasticsearch-output.conf
output {
elasticsearch {
hosts => ["192.168.0.yyy:9200"]
sniffing => true
manage_template => false
index => "beattest"
}
}
If we then go peek into the elastic with a http://192.168.0.yyy:9200/beattest/_search?q=qemu
, we can see the JSON results. Here is a fragment:
"Results":[
{"instructions":
{"Units":"instructions","Result":324468996333},
"instructions per cycle":{"Units":"insns per cycle","Result":1.69},
"cycles":{"Units":"cycles","Result":192423728747}
}
],
"env":{"ProxyVersion":"1.3.0-rc1-981fef4774ba15cf94a3a9013629d0ab60668348","ShimVersion":" kata-shim version 1.3.0-rc1-9b2891cfb153967fa4a65e44b2928255c889f643",
Yay, our JSON results has been copied (and, right now I mean copied - I think I can still see the data in the original message
field), into the JSON source. Now we should be able to locate and process that in Kibana.