Skip to content

Instantly share code, notes, and snippets.

@julianmorley
Last active September 6, 2018 10:20
Show Gist options
  • Save julianmorley/9bc5d2ff525fbfc39d80e1fa3e2641a8 to your computer and use it in GitHub Desktop.
Save julianmorley/9bc5d2ff525fbfc39d80e1fa3e2641a8 to your computer and use it in GitHub Desktop.
A proposed definition for an OCFL inventory file
{
"type": "Object",
"head": "#v6",
"previous": { "#v5": "bv65gh98" } // OPTIONAL. The "head" value and checksum of previous version of this file, if it exists.
"checksum": "md5" // OPTIONAL; default value is sha512. A string describing the checksum algo used in this file.
// For validation & object reconsitution, a scan of all version directories
// MUST contain at least 1 file that matches every checksum here,
// but we don't actually care what the filename is - just that the content
// is present.
"manifest": [
"a83e3633",
"bb123efc",
"f4abe741",
"ee983ac4"
]
// Here we use forward diffs to construct the object history through
// various versions. ADD, COPY, RENAME and DELETE actions are demonstrated.
// The intention is that the most recent inventory file should be capable
// of re-constituting the object to any prior delta or version level.
// It presumes that a scan of all the version directories has taken place,
// and that at least one file that matches every checksum referenced above has been found.
"deltas": [
// An array of delta objects.
{
"id": "#d1", // The first delta; initial add of 3 files
"version": "#v1" // This delta represents the object at v1.
"checksums": [
{ "a83e3633": ["file1"] },
{ "bb123efc": ["file2"] },
{ "f4abe741": ["file3"] }
]
},
{
"id": "#d2" // 2nd delta
"version": "#v2" // v2 copy file2 to file4
"checksums": [
{ "bb123efc": ["file2","file4"] }
]
},
{
"id": "#d3" // 3rd delta
"version": "#v3" //v3 rename file1 to file5
"checksums": [
{ "a83e3633": ["file5"] }
]
},
{
"id": "#d4" // 4th delta
"version": "#v4" // v4 add file6
"checksums": [
{ "ee983ac4": ["file6"] }
]
},
{
"id": "#d5" // 5th delta
"version": "#v5" // v5 delete file3
"checksums": [
{ "f4abe741": [""] }
]
},
{
"id": "#d6" // 6th delta
"version": "#v6" // v6 delete file4, rename file2 to file7
"checksums": [
{ "bb123efc": ["file7"] }
]
},
]
// currentState is OPTIONAL, and is a convenience object that represents the state of underlying perservation object at #head.
"currentState": {
"checksums": [
{ "a83e3633": ["file5"] },
{ "bb123efc": ["file7"] },
{ "ee983ac4": ["file6"] }
]
}
}
@awoods
Copy link

awoods commented May 30, 2018

  1. Are you imagining that the "checksums" provide additional protection beyond collecting the checksums from each of the deltas?
  2. Would an empty array work just as well here: https://gist.github.com/julianmorley/9bc5d2ff525fbfc39d80e1fa3e2641a8#file-inventory-L63

@julianmorley
Copy link
Author

  1. I see it as useful for disk fixity/verification - a quick way to answer the question of "is my stuff still here?". One of the weaknesses of Moab is that we must parse multiple files to put together the equivalent block of checksums, and it annoys me.
  2. Probably! I'm honestly not sure what the best way is to express the concept of a null key in JSON.

@julianmorley
Copy link
Author

"manifest": {
"checksums": [
{ "a83e3633": ["v1/file5", "v3/file19"] },
{ "bb123efc": ["v2/file7"] },
{ "ee983ac4": ["v3/file6"] }
]

Adds example of 2 files in 2 different versions with the same digest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment