Skip to content

Instantly share code, notes, and snippets.

@isaacdavis
Last active March 28, 2019 00:05
Show Gist options
  • Save isaacdavis/34c4cd045e84b482c3ee5cd72456833f to your computer and use it in GitHub Desktop.
Save isaacdavis/34c4cd045e84b482c3ee5cd72456833f to your computer and use it in GitHub Desktop.
Muskie log-fetching instructions

Retrieving muskie logs

These instructions are adapted from the comprehensive Manta Debugging Guide, with added clarification. See the debugging guide for more information on everything related to log-based debugging.

Log locations

Each zone maintains a log file for the current hour (a real-time log) at /var/log/muskie.log. At the top of every hour, the past hour's log file (now a historical log) is uploaded to the datacenter's manta at /poseidon/stor/logs/COMPONENT/YYYY/MM/DD/HH/SHORTZONE.log, where:

  • COMPONENT varies based on the component you’re looking for
  • YYYY, MM, DD, and HH represent the year, month, day, and hour for the entries in the log file
  • SHORTZONE is the first 8 characters of the zone’s uuid.

Real-time logs

To retrieve logs from all muskie instances from the current hour thus far, we can run manta-oneach from the headnode:

manta-oneach -s webapi 'cat /var/log/muskie.log'

In practice, it may be more useful to place an upper bound on the number of log entries we're retrieving (using tail) and filter to include error-level logs only (using grep). Here's a representative example, which also extracts specific fields from the log entries using json, sorts the results, and reports a count of how many times each tuple of fields appears:

manta-oneach -s webapi 'tail -n 900 /var/log/muskie.log | grep "handled: 5" |
    json -gaH res.statusCode route err.message | sort | uniq -c'

Note that it may also be useful to tee or redirect to a file, so the results don't get lost in your terminal scrollback.

Historical logs

To retrieve logs from all muskie instances outside of the current hour, we can use mfind and mls. This can be done anywhere you've set up the node-manta sdk to connect to your manta deployment.

Here's a representative example for December 5, 2018:

mfind -t o /poseidon/stor/logs/muskie/2018/12/05 | mjob create -o -m 'cat'

As with the current-hour logs, you can (and should) replace cat with whatever filtering and sorting pipeline you'd like. In production deployments, an hour's worth of logs can be so large you almost certainly don't want the whole thing!

You can adjust the mfind invocation as needed to scan a broader or more narrow time range. You can also use the -n argument to mfind to select log files for a particular zone over the given time range:

mfind -n f6817865.log -t o /poseidon/stor/logs/muskie/2018/12/05 |
    mjob create -o -m 'cat'

In manta deployments where jobs don't work, you can instead route the output of mfind into mget and process it locally, like so:

mfind -t o /poseidon/stor/logs/muskie/2018/12/05 | while read f; do mget $f; done

Note that in this case, you should use head instead of tail to grab a set number of lines of log output, so mget will return early once the specified number of lines have been read. Here's an example:

mfind -t o /poseidon/stor/logs/muskie/2018/12/05 | while read f; do mget $f | head -n 1000; done

Additional notes

  • The archival process for historical logs first rotates the logs to new files under /var/log/manta/upload. A few minutes later, these are uploaded to Manta and then removed from the local filesystem. If the upload fails, the files are kept in /var/log/manta/upload for up to two days. In extreme situations where Manta has been down for over an hour, you may find recent historical log files in /var/log/manta/upload, and you can scan them similar to the live log files using manta-oneach.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment