Skip to content

Instantly share code, notes, and snippets.

@tfogo
Last active September 7, 2021 13:40
Show Gist options
  • Save tfogo/6d50a259f24167eeb6a1f7227d295363 to your computer and use it in GitHub Desktop.
Save tfogo/6d50a259f24167eeb6a1f7227d295363 to your computer and use it in GitHub Desktop.
Editing mongodumps

Editing collection options and indexes in dumps

Dump directories

Dump directories have sub-directories for each database. Inside each database directory there is a metadata.json file for each collection. This file contains an extJSON representation of the collection options and index definitions. You can edit this file in a text editor to change the options and indexes.

If the dump was created with the --gzip option, the metadata.json file will be compressed. It will be a metadata.json.gz file. You can uncompress the file, edit it, then recompress it.

Dump archives

Archive files are a special data format for dumps. They follow this spec: https://github.com/mongodb/mongo-tools/blob/master/common/archive/spec.md

Before attempting to edit an archive, be sure to make a backup. It is possible that you corrupt the archive while attempting to edit it.

Open the archive in a hex editor.

The start of the archive contains a list of BSON documents containing the collection metadata of each collection. The format of the BSON document is:

{
    string db, 
    string collection, 
    string metadata, 
    int32 size,
    string type
}

You may need to reference https://bsonspec.org/spec.html for help figuring out how to change the length of the BSON document or the metadata string.

In the decoded portion of the hex editor, you should be able to find the collection you want to edit. If I want to edit the sample_mflix.sessions collection, I will look for this:

db�
���sample_mflix��collection�	���sessions��metadata����{"indexes":[{"v":{"$numberInt":"2"},"key":{"_id":{"$numberInt":"1"}},"name":"_id_","ns":"sample_mflix.sessions"}],"uuid":"bfcab3f53d4547558db73692ff763ecb","collectionName":"sessions","type":"collection"}��size������type

Here I see the namespace I want to edit db sample_mflix�collection sessions. After this, I see the metadata key and a string which is the collection options and indexes in extJSON. This is exactly the same string that we would find in a metadata.json file. We can edit this string. But BSON documents specify the size of the total document and the size of strings. So if the size of the string changes, we must change some numbers in the BSON document to match.

I highlight the string I am going to edit:

{"indexes":[{"v":{"$numberInt":"2"},"key":{"_id":{"$numberInt":"1"}},"name":"_id_","ns":"sample_mflix.sessions"}],"uuid":"bfcab3f53d4547558db73692ff763ecb","collectionName":"sessions","type":"collection"}�

I include the trailing \x00 character which is counted in the length of the string. I can now see in the hex editor that the 4 bytes before the string are CD 00 00 00, which is 205. I'm going to add an index to the metadata so it becomes:

{"indexes":[{"v":{"$numberInt":"2"},"key":{"_id":{"$numberInt":"1"}},"name":"_id_","ns":"sample_mflix.sessions"},{"v":{"$numberInt":"2"},"key":{"a":{"$numberInt":"1"}},"name":"_a","ns":"sample_mflix.sessions"}],"uuid":"bfcab3f53d4547558db73692ff763ecb","collectionName":"sessions","type":"collection"}�

This increases the length of the string by 97 to 302, so I change CD 00 00 00 to 2E 01 00 00.

We also need to increase the length of the BSON document by 97. Highlight the db key in the document. The bytes 02 64 62 00 represent the db key. The four bytes before this are the length of the BSON document. In my case it is 2D 01 00 00 or 301. We want to change this to 398 or 8E 01 00 00.

Now save the file and the archive should be usable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment