Rajveer100/.md

## .md

      
    Raw
  

              .md
            
          
    About FTDC

FTDC, originally short for full time diagnostic data capture, is MongoDB's internal diagnostic data collection facility.
It encodes data in a space-efficient format, which allows MongoDB to record diagnostic information every second, and store
weeks of data with only a few hundred megabytes (MBs) of storage. While it's a great inititative on Mongo's end to help us
monitor such metrics, at the moment, there isn't any concrete tool provided by Mongo or any open-source repos that we can
solely rely on to effectively decode the data.
But as usual, after some research and finding across the internet, there are certain blogs/tools/docs where
many software developers and open source contributors have done some great findings to get the closest possible visualisation
in different formats by utilizing open source tools. Let's explore it.
Motivation and Initial Steps

GoLang happens to have a good documentation about FTDC parsing and generation with their specific tools and functions
as here. It consists of all the details if one wanted to start
some work from scratch. Fortunately, there exists an open-source tool called Keyhole which is
implemented by Ken Chen which allows us to visualize FTDC metrics and statistics with several other informative details as
described in his blogs(3).
GoLang Setup

Before one proceeds installing KeyHole or any Go specifc tool/repo, here are some pre-requisites for setting up GoLang as it completely depends on modules/packages/directory and can cause problems while setting up other projects written in GoLang as well:

Install GoLang via it's website ensuring the command go version command works as described.
Open Terminal (or similar based on your OS) and type go env, where you will see a list of environment variables, we are
interested in GOROOT="/usr/local/go" and GOPATH="$HOME/go".
To make GO accessible in any directory in your command line you can export the PATH env of GO this way,
export PATH=$PATH:/usr/local/go/bin.

That completes initial installation setup, just one last step is that whenever a new project/workspace needs to be created,
you need to have it in your $GOPATH (i.e. "$HOME/go"). Before proceeding, ensure you have the /src, /pkg, /bin
directories in the $GOPATH (if not, you just need to do it the first time you install GO by mkdir). (Anytime you add a
new GO project it needs to be in the /src directory.)
Decoding FTDC Metrics Locally

A typical FTDC metrics file would contain two types (in the form of id's), i.e., a BSON Meta Document and a BSON Metrics
data field. This is further recognised by a type field (0/1) corresponding to doc/data field. While The latter is a BSON formatted data, the metrics data is encoded in binary/base64 format(i.e. encoded/compressed BSON format) and needs to unzipped.
To save our time once again, Alex Bevilacqua has a well written blog about FTDC Data and how we can use Mongo's bsondump utility and jq to query and
filter out the necessary data based on the type of the metrics.
Limitations About Decoding Process using BSONDump

If you try to execute the above command you may encounter issues related to byte size (memory/EOF/buffer limit) or
something similar. We can instead divide the metrics data into multiple chunks/objects and decode each object individually
and further append the same each time the ith chunk is successfully decoded. Here is modified script to analyse the context:
BASH Script
#Remove file, the script will create/append to the file.
if test -f "$2"; then
    rm $2
fi

#Decode metrics bson file into json.
bsondump --quiet $1 | jq 'select( .type."$numberInt" == "1" ) | .data."$binary".base64' | jq -s '.' > EncodedArray.json

#Get number of chunks.
EncodedArrayLength=$( cat EncodedArray.json | jq 'length' )
echo "$EncodedArrayLength Encoded Chunks Found"

#The Base64 decoding and bsondump MUST be done separately for each chunk.
for (( index=0; index<$EncodedArrayLength; index++ ))
do
    echo -ne "Decoding Chunk $index\033[0K\r"
    cat EncodedArray.json | jq --argjson i $index '.[$i]' | ruby -rzlib -rbase64 -e 'd = STDIN.read; print Zlib::Inflate.new.inflate(Base64.decode64(d)[4..-1])' > "DecodedArray.bson"

    #Append resulting chunk json data to the end of the file
    bsondump --quiet DecodedArray.bson | jq -s '.' >> $2
done

#Clean up temp files used.
rm DecodedArray.bson
rm EncodedArray.json
echo "All Chunks Decoded"


#command+f "vmstat" 
The above script can be executed by ./[Script_Name] [input metric file] [output JSON file] and you will have the decoded JSON data in your working directory.
Importing Decoded data to MongoDB

One could also import the same data to MongoDB for having a better overview and query necessary data based on the attributes
of the collection (in this case, FTDC metrics). Mongo already provides the mongoimport tool with various arguments (i.e,
--options), in our scenario, we will want to use the --jsonArray flag for each chunk in the metrics file and append the
same during the decoding process (as above).
We can now modify the script by adding the following just before we are done exiting with the for loop:
#Import each chunk to Mongo using mongoimport
touch temp_decoded_chunk.json
bsondump --quiet DecodedArray.bson | jq -s '.' >> temp_decoded_chunk.json
mongoimport --db [db_name] --collection [collection_name] --file temp_decoded_chunk.json --jsonArray
rm temp_decoded_chunk.json
You may verify the successful import by checking mongosh (or equivalent GUI, ex. Compass).
Analysing Mongo Logs

There are various tools across the internet that allow us to analyse Mongo Logs, in our case, we would be using Ken Chen's
tool called Hatchet (refer GoLang setup for installation). You may observe a variety of
options to analyse the logs, i.e, through URL, AWS S3, Atlas, etc, you may accordingly use based on your requirements.
This tool requires the log file to be in .gz format which you can create by using gzip tool. I presume the rest is well
defined in the repo.