Skip to content

Instantly share code, notes, and snippets.

@joehand
Last active June 10, 2017 18:35
Show Gist options
  • Save joehand/0d28a27150eb486ecaf746d07a1ced10 to your computer and use it in GitHub Desktop.
Save joehand/0d28a27150eb486ecaf746d07a1ced10 to your computer and use it in GitHub Desktop.
bagit + dat thinking

There can be two versions of a Dat Bag:

  1. "Holey" Bag - meant for archviving of metadata at specific Dat archive version.
  2. Complete Bag - a complete backup of an archive checkpoint.
  3. Serialized Bag - this would be for bag with Dat without having Dat stuff inside the bag

Holey Dat Bag:

A "Holey" bag contains a fetch.txt file which points to where to download the rest of the data if file is not in data/ payload.

dat/
  |   bagit.txt
  |   manifest-sha256.txt
  |   bag-info.txt
  |   tagmanifest-sha256.txt
  |   fetch.txt (contains dat:// links for all files)
  \--- data/
      |   [empty]
  \--- dat-tags/
      |   version
      |   metadata.key
      |   metadata.signatures
      |   metadata.bitfield
      |   metadata.tree
      |   metadata.data
      |   content.key
      |   content.signatures
      |   content.bitfield
      |   content.tree
  • Fetch file: fetch.txt contains dat:// links to all files (each line has format URL LENGTH FILENAME)
  • Tag Files: Copy .dat data to dat-tags. Published for a specific archive checkpoint (dat-tags/version).

The "tags" are metadata files intended to facilitate and document the storage and transfer of the bag.

Resource: https://docs.google.com/document/d/1JqKMFn9KfeIMAAEdOGQr6LZPqNWx8Qubi12uoUXi2QU/edit

Complete Dat Bag

Similar to above but with all the files resolved and copied to data payload folder.

dat/
  |   bagit.txt
  |   manifest-sha256.txt
  |   bag-info.txt
  |   tagmanifest-sha256.txt
  \--- data/
      |   [all files downloaded]
  \--- dat-tags/
      |   version
      |   metadata.key
      |   metadata.signatures
      |   metadata.bitfield
      |   metadata.tree
      |   metadata.data
      |   content.key
      |   content.signatures
      |   content.bitfield
      |   content.tree

Heavy Dat Bag

Similar to "complete dat bag" but also with dat-tags/content.data containing full version history. This would probably only work if we didn't have to hash content.data file but just used the blake hash in the tagmanifest.

Serialized Dat Bag

We could also keep the dat metadata outside the bag, which may be better for archive or transport:

dat/
  |   my-bag.tar.gz (or my-bag.zip)
  |   bag-sha256.txt (optional - checksum for serialized bag)
  \--- .dat/
      |   metadata.key
      |   metadata.signatures
      |   metadata.bitfield
      |   metadata.tree
      |   metadata.data
      |   content.key
      |   content.signatures
      |   content.bitfield
      |   content.tree

Or for non-serialized bags:

dat/
  \--- my-bag/
      |   bagit.txt
      |   bag-info.txt
      |   ... etc
      \--- data/
  |   bag-sha256.txt (optional - checksum for serialized bag)
  \--- .dat/
      |   metadata.key
      |   metadata.signatures
      |   metadata.bitfield
      |   metadata.tree
      |   metadata.data
      |   content.key
      |   content.signatures
      |   content.bitfield
      |   content.tree

Other Notes

@joehand
Copy link
Author

joehand commented Apr 10, 2017

DPN basically makes a merkle tree via bagit and uses the root in their registry =)

DPN Bag Transfer Protocol

  1. DPN will transfer valid DPN bags that have been 'tar'red. I.e. serialized bags.
  2. Upon finishing the transfer of a bag-tar file - the replicating node will compute the SHA256 hash of the serialized file. This is the hash that will be sent to the first-node and shows that the tarred bag was transferred without errors.
  3. The SHA256 hash of the bag's tagmanifest-sha256.txt file will be calculated by the originating node, used as the fixity_value for the bag, and kept in the DPN registry.

tagmanifest-sha256.txt includes sha256 hash of manifest.txt which has the hash of the content.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment