Skip to content

Instantly share code, notes, and snippets.

@donjohnson
Created April 27, 2012 22:15
Show Gist options
  • Save donjohnson/2513850 to your computer and use it in GitHub Desktop.
Save donjohnson/2513850 to your computer and use it in GitHub Desktop.
mysql slow query log to opentsdb using maatkit/percona query digester
pt-query-digest mysql.slow.log --no-report --filter 'print "put mysql.slowqueries $event->{timestamp} $event->{Query_time} query_md5=" . make_checksum($event->{fingerprint}) . " host=$event->{host} db=$event->{db} dbuser=$event->{user}\n"'|nc opentsdb 4242
format:
put mysql.slow.query 1335559893 1.889435 query_md5=AECBE3F75D62FCA4 host=api1 db=prod dbuser=app
@tsuna
Copy link

tsuna commented Apr 27, 2012

Putting the MD5 in a tag is a bad idea. Remember the default installation only allows up to 16777216 tag values, so use them wisely. There is no way to change the maximum number of tag values on an existing tsdb table.

@donjohnson
Copy link
Author

Since the MD5s are query fingerprints of supposedly rare queries, I don't (currently) expect enough variation in normalized queries (coming from hibernate) to reach that many values....probably never more than 5k in this case. The ability to filter on a specific query though is huge for my use case--will performance degrade approaching that limit, or linearly?

@tsuna
Copy link

tsuna commented Apr 27, 2012

If you have very few data points because you have very few slow queries, then the performance will not degrade much. The cost of the query is O(N) where N is the number of data points for the metric mysql.slow.query in the time range your query covers.

@tsuna
Copy link

tsuna commented Apr 27, 2012

Addendum: the reason of my comment above is because it's generally not recommended to have a script that can potentially create an unbounded number of tag values like you do with the MD5 sum. If there's a hiccup in your database and all of a sudden you log 20k slow queries, then you'll "waste" 20k tag value UIDs, and it's annoying/hard to "recycle" them.

@xaprb
Copy link

xaprb commented Apr 28, 2012

tsuna, you are incorrect. The md5sum is of the /normalized/ query, and it is extremely unlikely that there will be 20k /different kinds of queries/. Most database servers have less than a couple hundred types of queries executed against them in my experience.

@tsuna
Copy link

tsuna commented Apr 28, 2012

Ah OK, my bad then, I wasn't aware of that. Then yes it's probably fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment