- Logs should be structured text (e.g, JSON lines).
- Log messages should be static strings, using other fields to provide useful information.
- Use a correlation ID in a dedicated field to tie logs together.
- Refine correlation IDs as appropriate by appending to them.
Logs are for machines first, humans second. Logs have two important purposes: 1) Monitoring; 2) Debugging. Monitoring is
an all-the-time thing where as (hopefully) debugging is compartively rare, hence the logs are for monitoring first, and
debugging second. That said, the same principles make your logs better for both purposes! To be used for monitoring or debugging,
Logs need to be:
- understandable
- specific
- informative
Understandable means that both the machine and the human can make sense of the data in the log entry. In other words, it
needs to be unambiguously parseable. String interpolation or concatenation is the wrong answer here because
"username=" + username
works fine until the username has a space in it, and "username=\"%s\""
works ok until the username
has a quote in it, etc.
Structured text is the name of the game. JSON-lines has broad support and it's reasonably human readable.
In general, a log entry should look like this:
{
timestamp: Date;
message: string;
details: {};
}
For specific, we're talking precisely about the message
field. Specific means that each log entry is identifiable
as a specific event. In short, this means that every line of code where your logger is called should pass a different
string for the message
. While it should be a clear message that a human will understand, it may be useful to think
of it more as a label than a full log message: if you search for this string in your logs, every result should be
entries generated on this one line of code. This also means that if someone is looking at this line of code, they should
know what the string is to search for, which means it should generally be an inline string literal, or potentially a constant
whose value is a string-literal.
Informative means that once the reader (human or machine) has made identified the log entry they care about (specific) and
made sense of the data (understandable), that data is actually useful. That's where the details
comes in: any additional
pieces of information that are likely to be relevant to the log entry should be included here. Think of message
as
providing the context, and details
providing the...details.
Logs tend to be interleaved; multiple instances, multiple processes, multiple threads, all spitting out concurrent logs
for different tasks and they all get mixed together. It's important to know which logs go together. This is where correlation
IDs come in.
A correlation ID is just a string that's included in related logs. For instance, if you're system is processing
user requests, you can generate a new unique request ID as soon as the request is received: any logs related to that request
should include that request ID. If you're processing rows from a database, the row's unique ID makes a good correlation ID.
If you're pulling jobs off a queue, the queue item's ID should work.
If there is no good intrinsic identifier, just generate one. Either a random
v4 UUID or
NanoID are usually reasonable choices.
Correlation IDs can also be refined by appending to them. For instance, if you're processing a user request and using
requestId
as the correlation ID, then when you move on to generating the response you might want to use requestId + ".resp"
as the correlation ID. To get all the logs related to this request, you can search for entries that have
a correlation ID starting with the request ID. If you only care about the logs related to generating the response,
you can refine your search appropriately.
Another nice thing about using correlation IDs, is you don't have to keep repeating information all the time. For instance,
you don't need to include the user ID in every log, just log it once and you'll know that all correlated logs share the
same user ID (though in some cases, you may want to repeat information anyway, so a developer (or search query) doesn't need
to go looking for it).
For logging purposes, you should pick a dedicated field name for the correlation ID, something like corr_id
. That way
when you're looking at a log entry with a user_id
and a request_id
and a job_id
and they're all UUIDs, you don't need
to guess at which one is being used to correlate logs.
Additionally, it may be helpful to add a field like corr_id_from
to indicate where the correlation ID came from
(e.g., "request ID", "row ID", "generated"). This is probably only necesssary for the first log entry.
Here are some bad log statements:
- BAD:
log("Received ${httpMethod} request to ${path}.", { corrId, headers, body })
- The log message is dynamic, so it's going to be hard to find this in the logs.
- The httpMethod and path are included in the log entry, but they need to be parsed from the non-structured
part of the entry, which is just more work for a developer writing a log search to do, and more work for
the machine to do while searching your logs.
- We have a correlation ID, but it might be useful to know what intrinsic value that correlation ID came from.
- BETTER:
log("Request received", { corrId: corrIdFrom: "requestId", httpMethod, path, headers, body })
- BAD:
log({ corrId, corrIdFrom: "requestId", httpMethod, path, headers, body })
- There is no log message (i.e., no "tag") so not only is it going to be very hard to search for these
log entries, if you see them in the logs you probably won't have any idea what they mean.
- BETTER:
log("Request received", { requestId, httpMethod, path, headers, body })
- BAD:
log("Response sent", { statusCode, headers, body })
- There's no correlation ID, so we have no idea what the request was that generated this response.
- BETTER:
log("Response sent", { corrId, statusCode, headers, body })
- BAD:
log("Unknown exception caught: " + error.message)
- We have no context for where this exception was caught, or what we were trying to do when it was thrown.
- Error and exception objects often have additional useful properties beyond just their message, but these
haven't been included in the log.
- The error message itself needs to be parsed from the non-structured error message.
- BETTER:
log("ERROR Unknown exception caught while processing request", { corrId, error })