Skip to content

Instantly share code, notes, and snippets.

@CodesInChaos
Created September 15, 2012 13:54
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save CodesInChaos/78b950b1489d5fa3de85 to your computer and use it in GitHub Desktop.
Save CodesInChaos/78b950b1489d5fa3de85 to your computer and use it in GitHub Desktop.
Merkator File Spec

Basic ideas

I want a FileID that's independent from the encrypted transport representation. This allows merging downloads from different transport representations. This is important if one wants to change the sharing protocol in the future. For example by adding splitting and error correction like in Tahoe-LAFS, or using block wise transports like in Freenet.

I chose a File based transport. i.e. a File is the main unit of sharing. I also prefer pull/offer based sharing over push based sharing i.e. nodes offer files they have, instead of pushing the file into the network and hoping that it somehow stays there.

Compared with block based sharing, file based transports seem a bit more susceptible to censorship, and offer less plausible deniability. But I believe that using a treehash as FileID allows relatively easy migration to a block based system, if it becomes apparent that such a system is needed.

Compared with torrent style sharing, the tracker overhead is a bit larger. But it allows independent uploads of the same file to work together. It also avoids pieces spanning multiple files, allowing individual verifications of a file. Makes seeding files after moving/renaming them much easier too.

I decided to use a 256 bit hash, which offers 256 bit resistance against second pre-images, and 128 bit resistance against collisions. This seems like an appropriate level, since second pre-images would totally break the system, and collisions are pretty annoying but not fatal.
The hash -> encrypt -> hash again system used in actual sharing make practical attacks even harder, but I don't want to rely on this, since it is a nice property that a file is uniquely identified by its FileID.

When creating my treehash system, I noticed that the total size tree is a power of two, apart from a single unused hash-sized part. I decided to fill those with the size of the file and the first 24 bytes of the file. This allows quick sanity checking of the most important properties of a file without downloading the file itself: its size and the file type.

Hash Tree

########### ToDo: Is 1024 bytes the optimal leaf size? ########### If SHA3 gets selected before this project gets popular, I'll probably replace all uses of SHA-256 with SHA3, keeping the hash size at 256 bits. ########### Skein-256-256 in tree mode would be an alternative. It also has a built-in personalization feature. Unfortunately Skein-512-256 doesn't support 256 bit intermediate hashes :(

This defines the tree-hash. It's a binary merkle-tree based on SHA-256. It takes the LeafSize and a 7 byte ASCII string as purpose parameter.

Define the function Hash(Depth, Message) as SHA-256(Purpose || Depth || Message) where || denotes concatenation.

To calculate the tree-hash of a file:

  1. Divide the file into LeafSize byte blocks. The last block may be shorter.

  2. For each of those blocks calculate Hash(0, LeafBlock). If the number of hashes is not a power-of-two, pad the list of hashes(not the file) to the next power-of-two with 00 bytes.

  3. For each pair of hashes from the previous step, calculate Hash(Depth, HashPair) where HashPair is the concatenation of the two child hashes. If the right child hashes is all 00 simply keep the left hash. Depth is 1 for the first level above the leaves, and gets incremented with each level.

  4. Repeat step 3 until there is only one hash left.

  5. Concat the following to form the 64 byte RootBlock:

  • The size of the file as 8 byte big-endian integer
  • The first 24 bytes of the file (padded with 00 if the file is shorter than that)
  • The root of the original tree (result of step 4)
  1. Calculate Hash(255, RootBlock).

TreeBlock format:

Concat:

  • The size of the file as 8 byte big-endian integer
  • The first 24 bytes of the file (padded with 00 if the file is shorter than that)
  • The individual hashes in the tree, layer wise, starting with the root.

Note: This implies that the TreeBlock starts with the RootBlock.

This tree may be truncated at the end of any layer. Typically at a fixed size, called PieceSize (e.g. 2^15 bytes = 32 KiB)
It must at least contain the first 64 bytes (the root block). For files smaller than the PieceSize, it will always consist of 64 bytes.
When doing this, the size of the tree in bytes will always be a power-of-two.

FileID

Set Purpose to "FileID\0" and LeafSize to 1024. The hash of the root block (result of step 6) is the FileID.

Share Format

  • Calculate the FileID of the plaintext file
  • ExtraKey is a 16 byte key. It defaults to 00 repeated 16 times.
    Sometimes it will contain a password hash. I haven't decides on a scheme yet, perhaps scrypt. But how many iterations?
    Use FileID as salt, or use no salt? A salt might lead to performance problems with many small files. No salt obviously allows multi target attacks.
    All users with the same key/password get convergent encryption
  • Calculate MasterShareKey as HMAC-SHA256(Key = "Merkator.MasterShareKey", Message = FileID || ExtraKey)
  • Calculate ShareFileKey as the first 16 bytes of HMAC-SHA256(Key = "Merkator.ShareFileKey-AES128-CTR", Message = MasterShareKey)
  • Calculate ShareFileNonce as the first 8 bytes of HMAC-SHA256(Key = "Merkator.ShareFileNonce-AES128-CTR", Message = MasterShareKey)
  • Build the FileID-TreeBlock of the plaintext file, with a truncation blocksize of 32 KiB.
  • Calculate sizeof(PlaintextFile) + sizeof(FileID-TreeBlock). Find the smallest integer, greater or equal to this number that's a multiple of both the TreeBlock size and LeafSize. This is the sizeof(ShareFile).
  • Concat the PlainTextFile, a variable length padding consisting of 00 bytes and the FileID-TreeBlock so that the total size is sizeof(ShareFile) from the previous step. The PlainTextFile is in the beginning of the resulting file, and the TreeBlock in the very end. The padding is in between, and it's length is sizeof(ShareFile) - (sizeof(PlaintextFile) + sizeof(FileID-TreeBlock))
  • Encrypt this with AES128 in CTR mode, with ShareFileKey as key, and ShareFileNonce as nonce. The block nonce is produced by concatenating the 8 byte ShareFileNonce with the 8 byte big-endian block counter.
    (Note: Since the key is never reused for a different file, a deterministic nonce is fine)
  • That's the ShareFile

Share-TreeBlock and ShareID

Calculate the Hash Tree with Purpose set to "ShareID" and LeafSize to 1024. The root hash defines the ShareID. Take the TreeBlock with a truncation blocksize of 32 KiB. This is the Share-TreeBlock.

Links

Links consist of the Filename, an optional path, the FileID, the ShareID and the Filesize.

Some candidate formats:

  1. merkfile://abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz/dir/myfile.txt?size=12345678901&share=abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz
    This is intended for downloads with external programs. Clicking one of those should open a download manager.
  2. http://abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz.merkfile/dir/myfile.txt?size=12345678901&share=abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz
  • This one works in the browser, assuming a proxy is used. Since we want the proxy for anoynmization, this is a reasonable assumption.
  • Every file has a distinct domain, preventing xss.
  • Has a high leak risk through DNS.
  1. http://somehost.com/abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz/dir/myfile.txt?size=12345678901&share=abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz
    Remote gateway, not sure if this will see much use
  2. http://localhost:1234/abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz/dir/myfile.txt?size=12345678901&share=abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz
    Local gateway. Similar to 2, but does not require a proxy. XSS might be possible.

Putting the filename into the path portion of the url, instead of a query parameter allows more characters to remain unencoded.

There might be multiple share parameters. The optional password/extra key needs to be part of the share parameter. Perhaps separate it with a - for a key, and a + for a password? Passwords should be first UTF-8, then url encoded.

Password example: &share=abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz+MySecretPassword
Key example: &share=abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz-abcdefghijklmnopqrstuvwxyz

Optional parameter for mime type: &mime=text/html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment