The problem is that some lines in the file are missing timestamps when they aren't continuations of any previous line. It's dumb, really.
The mysql query log is seriously bullshit format, but nothing logstash can't unscrew.
The main goal here is to show how we can fix the 'missing timestamp' problem.
% ruby bin/logstash agent -e '
input {
stdin {
type => "mysql-slowquery"
}
}
filter {
# mysql query logs sometimes are missing a timestamp
# and use spaces instead (WTF), so merge timestampless events with the
# previous event.
multiline {
what => previous
pattern => "^\s"
}
# pull out the timestamp (like, "120707 0:40:34")
grok { pattern => "^%{NUMBER:date} *%{NOTSPACE:time}" }
# put the timestamp into a single field
mutate { replace => [ "time", "%{date} %{time}" ] }
# parse the timestamp, which could be one or two digits.
date { time => [ "YYMMdd H:mm:ss", "YYMMdd HH:mm:ss" ] }
# remove time/date fields only previously added for parsing.
mutate { remove => [ "time", "date" ] }
# Now split up the multiline again, which keeps the timestamp for all split
# out events. The defaults here are fine as they split '@message' by '\n'
split { }
}
output {
stdout { debug => true }
}
'
120707 0:40:34 4 Connect root@localhost on
4 Query select @@version_comment limit 1
120707 0:40:45 4 Query select * from mysql.user
120707 0:41:18 5 Query hello world
The key thing to note here is that the 'select @@version_comment ...' event now has the same timestamp as the previous line.
{
"@source" => "stdin://carrera.databits.net/",
"@tags" => [
[0] "multiline"
],
"@fields" => {
"time" => "120707 0:40:34"
},
"@timestamp" => "2012-07-07T04:40:34.000Z",
"@source_host" => "carrera.databits.net",
"@source_path" => "/",
"@message" => "120707 0:40:34 4 Connect root@localhost on",
"@type" => "mysql-slowquery"
}
{
"@source" => "stdin://carrera.databits.net/",
"@tags" => [
[0] "multiline"
],
"@fields" => {
"time" => "120707 0:40:34"
},
"@timestamp" => "2012-07-07T04:40:34.000Z",
"@source_host" => "carrera.databits.net",
"@source_path" => "/",
"@message" => " 4 Query select @@version_comment limit 1",
"@type" => "mysql-slowquery"
}
{
"@source" => "stdin://carrera.databits.net/",
"@tags" => [],
"@fields" => {
"time" => "120707 0:40:45"
},
"@timestamp" => "2012-07-07T04:40:45.000Z",
"@source_host" => "carrera.databits.net",
"@source_path" => "/",
"@message" => "120707 0:40:45 4 Query select * from mysql.user",
"@type" => "mysql-slowquery"
}
{
"@source" => "stdin://carrera.databits.net/",
"@tags" => [],
"@fields" => {
"time" => "120707 0:41:18"
},
"@timestamp" => "2012-07-07T04:41:18.000Z",
"@source_host" => "carrera.databits.net",
"@source_path" => "/",
"@message" => "120707 0:41:18 5 Query hello world",
"@type" => "mysql-slowquery"
}
I have quite the same issue with a similar type of log. That's a very helpful solution @jordansissel! However, I need to preserve information about line order. Specifically, I have an "offset" field shipped by filebeat, which denotes the position of a message within the read file, so I can sort messages having equal timestamps later. Now by using the split filter, the offset field would just be cloned. Thus information about message order would get lost. Is there a possibility to add some kind of counter / index when splitting a message?