Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
parsing mysql's bullshit query log format with logstash

parsing mysql query logs with logstash

The problem is that some lines in the file are missing timestamps when they aren't continuations of any previous line. It's dumb, really.

The mysql query log is seriously bullshit format, but nothing logstash can't unscrew.

The main goal here is to show how we can fix the 'missing timestamp' problem.

% ruby bin/logstash agent -e '
  input {
    stdin {
      type => "mysql-slowquery"
    }
  }
  filter {
    # mysql query logs sometimes are missing a timestamp
    # and use spaces instead (WTF), so merge timestampless events with the
    # previous event.
    multiline {
      what => previous
      pattern => "^\s"
    }

    # pull out the timestamp (like, "120707  0:40:34")
    grok { pattern => "^%{NUMBER:date} *%{NOTSPACE:time}" }

    # put the timestamp into a single field
    mutate { replace => [ "time", "%{date} %{time}" ] }

    # parse the timestamp, which could be one or two digits.
    date { time => [ "YYMMdd H:mm:ss", "YYMMdd HH:mm:ss" ] }

    # remove time/date fields only previously added for parsing.
    mutate { remove => [ "time", "date" ] }

    # Now split up the multiline again, which keeps the timestamp for all split
    # out events. The defaults here are fine as they split '@message' by '\n'
    split { }
  }

  output {
    stdout { debug => true }
  }
'

input

120707  0:40:34     4 Connect   root@localhost on
                    4 Query     select @@version_comment limit 1
120707  0:40:45     4 Query     select * from mysql.user
120707  0:41:18     5 Query     hello world

logstash output

The key thing to note here is that the 'select @@version_comment ...' event now has the same timestamp as the previous line.

{
  "@source"      => "stdin://carrera.databits.net/",
  "@tags"        => [
    [0] "multiline"
  ],
  "@fields"      => {
    "time" => "120707 0:40:34"
  },
  "@timestamp"   => "2012-07-07T04:40:34.000Z",
  "@source_host" => "carrera.databits.net",
  "@source_path" => "/",
  "@message"     => "120707  0:40:34     4 Connect   root@localhost on",
  "@type"        => "mysql-slowquery"
}
{
  "@source"      => "stdin://carrera.databits.net/",
  "@tags"        => [
    [0] "multiline"
  ],
  "@fields"      => {
    "time" => "120707 0:40:34"
  },
  "@timestamp"   => "2012-07-07T04:40:34.000Z",
  "@source_host" => "carrera.databits.net",
  "@source_path" => "/",
  "@message"     => "                    4 Query     select @@version_comment limit 1",
  "@type"        => "mysql-slowquery"
}
{
  "@source"      => "stdin://carrera.databits.net/",
  "@tags"        => [],
  "@fields"      => {
    "time" => "120707 0:40:45"
  },
  "@timestamp"   => "2012-07-07T04:40:45.000Z",
  "@source_host" => "carrera.databits.net",
  "@source_path" => "/",
  "@message"     => "120707  0:40:45     4 Query     select * from mysql.user",
  "@type"        => "mysql-slowquery"
}
{
  "@source"      => "stdin://carrera.databits.net/",
  "@tags"        => [],
  "@fields"      => {
    "time" => "120707 0:41:18"
  },
  "@timestamp"   => "2012-07-07T04:41:18.000Z",
  "@source_host" => "carrera.databits.net",
  "@source_path" => "/",
  "@message"     => "120707  0:41:18     5 Query     hello world",
  "@type"        => "mysql-slowquery"
}
@amrit145

This comment has been minimized.

Copy link

amrit145 commented Jan 7, 2015

You are using a deprecated config setting "pattern" set in grok. Deprecated settings will continue to work, but are scheduled for removal from logstash in the future. You should use this instead: match => { "message" => "your pattern here" } If you have any questions about this, please visit the #logstash channel on freenode irc. {:name=>"pattern", :plugin=><LogStash::Filters::Grok --->, :level=>:warn}
Unknown setting 'time' for date {:level=>:error}
Error: Something is wrong with your configuration.
You may be interested in the '--configtest' flag which you can
use to validate logstash's configuration before you choose
to restart a running system.

getting above error after using your configuration

@mjpowersjr

This comment has been minimized.

Copy link

mjpowersjr commented Feb 7, 2015

@amrit145 - I have a feeling @jordansissel knows about the deprecated settings. :-)

I discovered a different approach to parsing the slow query log that might be useful (configure mysql to log as CSV format):

https://gist.github.com/mjpowersjr/740a9583e9ec8b49e0a3

@nelsonauner

This comment has been minimized.

Copy link

nelsonauner commented Jun 11, 2015

@alpha-centauri

This comment has been minimized.

Copy link

alpha-centauri commented Apr 20, 2016

I have quite the same issue with a similar type of log. That's a very helpful solution @jordansissel! However, I need to preserve information about line order. Specifically, I have an "offset" field shipped by filebeat, which denotes the position of a message within the read file, so I can sort messages having equal timestamps later. Now by using the split filter, the offset field would just be cloned. Thus information about message order would get lost. Is there a possibility to add some kind of counter / index when splitting a message?

@rudibroekhuizen

This comment has been minimized.

Copy link

rudibroekhuizen commented Jan 5, 2017

With the MySQL Shell is is possible to export the content of a table in json format. Set in your my.conf:

slow_query_log = 1
log_output = table

echo "select * from slow_log;" | mysqlsh --schema=mysql --json=pretty --sqlc --uri root@127.0.0.1:3306

    {
        "start_time": {
            "year": 2017,
            "month": 0,
            "day": 5,
            "hour": 16,
            "minute": 50,
            "second": 26.0
        },
        "user_host": "root[root] @ localhost []",
        "query_time": "00:00:10",
        "lock_time": "00:00:00",
        "rows_sent": 1,
        "rows_examined": 0,
        "db": "",
        "last_insert_id": 0,
        "insert_id": 0,
        "server_id": 0,
        "sql_text": "select sleep(10)",
        "thread_id": 52
    }

This also works for general_log.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.