Skip to content

Instantly share code, notes, and snippets.

@kr

kr/logfmt.md Secret

Created November 10, 2012 00:45
Show Gist options
  • Star 7 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save kr/0e8d5ee4b954ce604bb2 to your computer and use it in GitHub Desktop.
Save kr/0e8d5ee4b954ce604bb2 to your computer and use it in GitHub Desktop.
Log format description

logfmt

This format is intended to strike a balance between human and machine readability.

It aims to codify existing logging conventions and nail down some formerly thorny or ambiguous cases, while leaning on a familiar data model (JSON).

Summary: mostly JSON, with a couple of restrictions, a couple of extensions, and a couple of changes.

  • A log message is a JSON object, omitting the enclosing curly braces.
  • Strings can appear unquoted if they look like C identifiers (only alphanumeric characters and underscore; can't begin with a digit). Note that the encodings of values null, true, and false also fit this description, which means that if you want the string “null” you have to encode it with quotes (as "null").
  • We use equals instead of colon, and leave out the commas.
  • There is an RFC 3339 timestamp literal format with optional nanoseconds. These values cannot be unambiguously represented in JSON (where the best you can do is pun them with strings).

Example JSON

{"dyno":"web", "cmd":"thin -R config.ru -p 3000 start", "scale":3, "err":null}
{"dyno":"web", "metadata":{"stack":["line 1", "line 2"], "msg":"null"}}
{"t":"2012-06-19T11:02:47.123456789-0400", "machine":"awondo", "event":"start"}

Equivalent logfmt

dyno=web cmd="thin -R config.ru -p 3000 start" scale=3 err=null
dyno=web metadata={stack=["line 1" "line 2"] msg="null"}
t=2012-06-19T11:02:47.123456789-0400 machine=awondo event=start

Open Question: Should we allow unquoted strings of the form 3ms? Currently hermes generates such messages.

Open Question: Should we require timestamps to be in UTC?

Grammar

Forthcoming.

Key Naming Conventions

Forthcoming.

@bmizerany
Copy link

err=nil

@asenchi
Copy link

asenchi commented Dec 4, 2012

@kr I like it. I go back and forth on the 'null' vs. 'nil' argument. I think 'nil' might be fine.

Regarding your two open questions, I think "3ms" should be quoted and time should be standardized around ISO8601. Forcing UTC is what everyone wants but it isn't a reality in a lot of places. Granted this may just be for Heroku in which case you guys can already standardize. But I think for a specification forcing time format is more interesting.

I am working on a writeup for a unified logging format internally at GH, I'll gist and share once it's done.

@kr
Copy link
Author

kr commented Dec 4, 2012

Awesome, I'm looking forward to that gist.

I agree about standardizing on ISO 8601 (actually
RFC 3339 is better). Unfortunately, that spec
doesn't address how to represent durations.
In some places we've started using a convention
like time=3 units=ms, but that might be overly
fussy.

I chose null because that's what JSON uses.
Is there any reason to be different?

@rwdaigle
Copy link

How do we feel about nested values (I.e. the metadata key in the second log)? Is this a case that needs to be covered or should we be pushing for single level, but appropriately namespaced, logs?

Also, what guidance do we have re: key names using underscores or dashes? Oh, crap, I see that topic is "forthcoming", sorry.

@zimbatm
Copy link

zimbatm commented Feb 28, 2013

Some days ago I started writing a parser for the spec. I'm not 100% happy with the error reporting and the time parsing isn't working yet. https://gist.github.com/zimbatm/1e9a4021c885e096fcbb#file-syntax_parslet-rb-L163

I think it would be interesting to change the syntax to accept arbitrary types:

space: ' '
key: [a-Z09-_]+
literals: true | false | nil
number: TODO
time: TODO
quoted_string: TODO
object: '{' key=value separated by spaces* '}'
array: '[' values separated by spaces* ']'
other: [^ ]+
value: object | array | quoted_string | time | number | literals | other

The specification of the "other" type is platform-dependent and defaults as a string. So for example if your have the other value 3ms, if your language supports it, you can transform it into a time interval, otherwise it becomes the "3ms" string.

EDIT: update to add the time primitive

@kr
Copy link
Author

kr commented Feb 28, 2013

Oh wow @zimbatm, that's really cool. I hope you're prepared to make some
changes if necessary as this draft evolves. It's definitely not final or anything. :)

@zimbatm
Copy link

zimbatm commented Mar 19, 2013

@kr: Sure, Parslet is easy :)

Actually I didn't get your notification and since the activity seemed pretty slow I started my own log format initiative. I hope you don't mind :/ It wasn't really to own it but more because I think this is an awesome idea and would love to get a cross-language specification.

@zimbatm
Copy link

zimbatm commented Mar 19, 2013

Actually I'm going to add you to the project if you don't mind

@whatupdave
Copy link

Wrote a quick log generator in go here: https://github.com/whatupdave/dlog

@asenchi
Copy link

asenchi commented Apr 3, 2013

@zimbatm Nice work, I'll be sending some ideas around agreed upon keys, but thus far I think you are on the right track. I've unfortunately haven't had much time to work on a format internally, but I am glad that there is a more community driven response here.

Lets work to get this in solid shape. @kr Heroku still has, what I would consider to be, the most thorough implementation of "logs as data" so it would be great if you or someone else could work on this as well.

Thanks again @zimbatm

@zimbatm
Copy link

zimbatm commented Dec 22, 2013

@kr: seems you've got the hand on EBNF format. just saw it here: http://godoc.org/github.com/kr/logfmt . What's the latest updates on logfmt ?

@asenchi: after 9 months of thinking (ok not constantly) I think the simpler logfmt k/v format is better than the one with types. I'm thinking of retiring the lines format as a failed experiment.

The parsing is easy to be made robust with logfmt which is important when consuming diverse sources. I also found that type isn't always desirable. For example in a scenario where logs would be indexed in ElasticSearch the first time a key appears the index is created with the given value type but there is no guarantee that the next log entry won't have the same key with a different type. Aside from that ElasticSearch also needs a fixed set of keys or you run the risk of blowing up your memory with arbitrary indexes (happened to me). Once you've decided on the keys you might as well choose the type all the values are going to be cast to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment