FTDC, originally short for full time diagnostic data capture, is MongoDB's internal diagnostic data collection facility. It encodes data in a space-efficient format, which allows MongoDB to record diagnostic information every second, and store weeks of data with only a few hundred megabytes (MBs) of storage. While it's a great inititative on Mongo's end to help us monitor such metrics, at the moment, there isn't any concrete tool provided by Mongo or any open-source repos that we can solely rely on to effectively decode the data.
But as usual, after some research and finding across the internet, there are certain blogs/tools/docs where many software developers and open source contributors have done some great findings to get the closest possible visualisation in different formats by utilizing open source tools. Let's explore it.
GoLang happens to have a good documentation about FTDC parsing and generation with their specific tools and functions as here. It consists of all the details if one wanted to start some work from scratch. Fortunately, there exists an open-source tool called Keyhole which is implemented by Ken Chen which allows us to visualize FTDC metrics and statistics with several other informative details as described in his blogs(3).
Before one proceeds installing KeyHole or any Go specifc tool/repo, here are some pre-requisites for setting up GoLang as it completely depends on modules/packages/directory and can cause problems while setting up other projects written in GoLang as well:
- Install GoLang via it's website ensuring the command
go version
command works as described. - Open Terminal (or similar based on your OS) and type
go env
, where you will see a list of environment variables, we are interested inGOROOT="/usr/local/go"
andGOPATH="$HOME/go"
. - To make GO accessible in any directory in your command line you can export the
PATH
env of GO this way,export PATH=$PATH:/usr/local/go/bin
.
That completes initial installation setup, just one last step is that whenever a new project/workspace needs to be created,
you need to have it in your $GOPATH
(i.e. "$HOME/go"
). Before proceeding, ensure you have the /src
, /pkg
, /bin
directories in the $GOPATH
(if not, you just need to do it the first time you install GO by mkdir
). (Anytime you add a
new GO
project it needs to be in the /src
directory.)
A typical FTDC metrics file would contain two types (in the form of id's), i.e., a BSON Meta Document and a BSON Metrics
data field. This is further recognised by a type field (0/1) corresponding to doc
/data
field. While The latter is a BSON formatted data, the metrics data
is encoded in binary/base64 format(i.e. encoded/compressed BSON format) and needs to unzipped.
To save our time once again, Alex Bevilacqua has a well written blog about FTDC Data and how we can use Mongo's bsondump
utility and jq
to query and
filter out the necessary data based on the type
of the metrics.
If you try to execute the above command you may encounter issues related to byte size (memory/EOF/buffer limit) or
something similar. We can instead divide the metrics data into multiple chunks/objects
and decode each object individually
and further append the same each time the ith chunk
is successfully decoded. Here is modified script to analyse the context:
BASH Script
#Remove file, the script will create/append to the file.
if test -f "$2"; then
rm $2
fi
#Decode metrics bson file into json.
bsondump --quiet $1 | jq 'select( .type."$numberInt" == "1" ) | .data."$binary".base64' | jq -s '.' > EncodedArray.json
#Get number of chunks.
EncodedArrayLength=$( cat EncodedArray.json | jq 'length' )
echo "$EncodedArrayLength Encoded Chunks Found"
#The Base64 decoding and bsondump MUST be done separately for each chunk.
for (( index=0; index<$EncodedArrayLength; index++ ))
do
echo -ne "Decoding Chunk $index\033[0K\r"
cat EncodedArray.json | jq --argjson i $index '.[$i]' | ruby -rzlib -rbase64 -e 'd = STDIN.read; print Zlib::Inflate.new.inflate(Base64.decode64(d)[4..-1])' > "DecodedArray.bson"
#Append resulting chunk json data to the end of the file
bsondump --quiet DecodedArray.bson | jq -s '.' >> $2
done
#Clean up temp files used.
rm DecodedArray.bson
rm EncodedArray.json
echo "All Chunks Decoded"
#command+f "vmstat"
The above script can be executed by ./[Script_Name] [input metric file] [output JSON file]
and you will have the decoded JSON data in your working directory.
One could also import the same data to MongoDB for having a better overview and query necessary data based on the attributes
of the collection (in this case, FTDC metrics). Mongo already provides the mongoimport tool with various arguments (i.e,
--options
), in our scenario, we will want to use the --jsonArray
flag for each chunk in the metrics file and append the
same during the decoding process (as above).
We can now modify the script by adding the following just before we are done exiting with the for loop
:
#Import each chunk to Mongo using mongoimport
touch temp_decoded_chunk.json
bsondump --quiet DecodedArray.bson | jq -s '.' >> temp_decoded_chunk.json
mongoimport --db [db_name] --collection [collection_name] --file temp_decoded_chunk.json --jsonArray
rm temp_decoded_chunk.json
You may verify the successful import by checking mongosh
(or equivalent GUI, ex. Compass).
There are various tools across the internet that allow us to analyse Mongo Logs, in our case, we would be using Ken Chen's
tool called Hatchet (refer GoLang setup for installation). You may observe a variety of
options to analyse the logs, i.e, through URL, AWS S3, Atlas, etc, you may accordingly use based on your requirements.
This tool requires the log file to be in .gz
format which you can create by using gzip
tool. I presume the rest is well
defined in the repo.