Skip to content

Instantly share code, notes, and snippets.

@ahankinson
Last active September 6, 2018 10:19
Show Gist options
  • Save ahankinson/504cc18f1f575f968635474b1eb421c3 to your computer and use it in GitHub Desktop.
Save ahankinson/504cc18f1f575f968635474b1eb421c3 to your computer and use it in GitHub Desktop.

Two possible versions of an inventory file are shown. The first (inventory1.jsonld) contains all information in the 'Version' object. "members" within the version provide an inventory of the member files for that version, while "state" provides the full state of the object at that version.

inventory2.jsonld is the original proposal. "members" in the Version object track all the members of the object (analagous to "state" in the previous proposal. In this example they are addressed by digest, but they could also be addressed by filepath.

Considerations:

  • How do we represent the "state" of the object at a given version?
  • If a user requests "version 4", what do they get back? A list of filepaths?
  • What happens if a user requests two files in the same path with the same name, with only the root being different? (e.g., v1/foo.txt, v2/foo.txt) Can 'object-relative' filepaths be duplicates, even though their storage filepaths are different?
{
"@context": "https://ocfl.io/v1.0/",
"id": "urn:ark:/12345/bbb11ccc22",
"type": "Object",
"hashAlgorithm": "sha512",
"versions": [{
"type": "Version",
"id": "#v1",
"created": "2014-01-01T12:00:00Z",
"message": "Initial version",
"client": "OCFL Python Library 1.1.0",
"user": {
"name": "Andrew Hankinson",
"email": "andrew.hankinson@bodleian.ox.ac.uk"
},
// # Member paths are local to the root of all locations. For zip files, this means
// # that the zip file must expand to have these files in the base directory, i.e.,
// # effectively "v1.zip/data/blah.txt", "v1.zip/metadata/foo.xml"
"members": {
"v1/data/blah.txt": "sha512-1",
"v1/metadata/foo.xml": "sha512-2"
},
"state": {
"v1/data/blah.txt": "sha512-1",
"v1/metadata/foo.xml": "sha512-2"
}
}, {
// Add new files; replace
"type": "Version",
"id": "#v2",
"created": "2015-05-01T12:00:00Z",
"message": "Updated foo",
"client": "OCFL Python Library 1.1.0",
"user": {
"name": "Andrew Hankinson",
"email": "andrew.hankinson@bodleian.ox.ac.uk"
},
"locations": [{
"type": "Location",
"path": "v2/"
}],
"members": {
"v2/data/nnn/ppp1.tiff": "sha512-3",
"v2/metadata/foo.xml": "sha512-4",
"v2/metadata/empty.txt": "sha512-5"
},
"state": {
"v1/data/blah.txt": "sha512-1", // from v1
"v2/data/nnn/ppp1.tiff": "sha512-3", // from v2
"v2/metadata/foo.xml": "sha512-4", // from v2
"v2/metadata/empty.txt": "sha512-5" // from v2
}
}, {
// Add a duplicate hash, different filename
"type": "Version",
"id": "#v3",
"created": "2015-01-01T12:00:00Z",
"message": "Problem with bar",
"client": "OCFL Python Library 1.1.0",
"user": {
"name": "Andrew Hankinson",
"email": "andrew.hankinson@bodleian.ox.ac.uk"
},
"members": {
"v3/metadata/empty2.txt": "sha512-5" // <-- same sha512 as empty.txt
},
"state": {
"v1/data/blah.txt": "sha512-1",
"v2/data/nnn/ppp1.tiff": "sha512-3",
"v2/metadata/foo.xml": "sha512-4",
"v2/metadata/empty.txt": "sha512-5",
"v3/metadata/empty2.txt": "sha512-6"
}
}, {
// Delete empty2.txt; no 'members'
"type": "Version",
"id": "#v4",
"created": "2016-01-01T12:00:00Z",
"message": "Deleted empty2.txt",
"client": "OCFL Python Library 1.2.0",
"user": {
"name": "Andrew Hankinson",
"email": "andrew.hankinson@bodleian.ox.ac.uk"
},
"state": {
"v1/data/blah.txt": "sha512-1",
"v2/data/nnn/ppp1.tiff": "sha512-3",
"v2/metadata/foo.xml": "sha512-4",
"v2/metadata/empty.txt": "sha512-5"
}
}, {
// Rename and Replace foo.xml
"type": "Version",
"id": "#v5",
"created": "2016-01-01T12:00:00Z",
"message": "Replaced foo.xml",
"client": "OCFL Python Library 1.2.0",
"user": {
"name": "Andrew Hankinson",
"email": "andrew.hankinson@bodleian.ox.ac.uk"
},
"location": {
"members": {
"v5/metadata/foo.xml": "sha512-6"
},
"state": {
"v1/data/blah.txt": "sha512-1",
"v2/data/nnn/ppp1.tiff": "sha512-3",
"v5/metadata/foo.xml": "sha512-6",
"v5/metadata/foo-old.xml": "sha512-4",
"v2/metadata/empty.txt": "sha512-5"
}
}
}]
}
{
"@context": "https://ocfl.org/v1.0/",
"id": "urn:ark:12148/btv1b84490444",
"type": "Object",
"head": "#v3",
// This is a list of all files that are in the object. It must be updated
// with files added in new versions.
// Writing the same file to a new location (same checksum, different path)
// is an error, since versions are fixed and cannot be changed. Instead,
// if a file is restored in later versions, it can be referenced in its earlier
// incarnation.
"manifest": {
"v1/data/page1/btv1b84490444_page1.jp2": "a83e3633c880fb4caae55edd60571d930681496572f6ec6910b15a7796a38fcf0fff6aaf4acb8aac0f9a3049e42cf27bca0e2790105705d35d5727e1b9443f44",
"v1/data/page1/btv1b84490444_page1.tiff": "2638e7c5bab6aadd2cb8eec5a48eefe35d25e005b55bd638e8ce222ef7204a790e64613ec63bb66e543c7e6d610391ebc3dd1b019b5ed028d942a1b00a0fb177",
"v1/data/page1/btv1b84490444_page1.xmp": "e8eb20d68e2960cc77cf52976f665c368a5c5aba43af3e8cfa2bf5c56547087fa290f06b49a595cf6557c5d882e0ebbe16e2231771c659dc9d166f5320b40eb7",
"v1/data/page2/btv1b84490444_page2.tiff": "de655a830d1c5d5c5cb6bf333d43f63fabf23a325d1a821c057b9ccf53e754c2e57e99e1753b1bcf071d1e6f9ced0067b7873fe2cb4dcd2d6741c27085e75f4f",
"v2/data/page2/btv1b84490444_page2.jp2": "83fe4cd4e6b711c29f0b9e16b0c7700c4c9946b6c61af261341efda4252d988cb3584d77f16e16138d012e6960c1c23467de4b8e3cfacda802e24920636212f6",
"v2/data/page2/btv1b84490444_page2.xmp": "f3f965c408011fab1b32e31e1536f77ec6b8796945d3ec6414e6bf46320c21b1af05a43f98da161a382c29594c3abb2a4426299bc1340362f9fb1ad71c2100df",
"v3/data/page1/btv1b84490444_page1_reshoot.jp2": "fd3cfe0321074427d3907a2ff581d91112a831db84d0ef04be5dc0cb0628b01e2bfbd0f6375cc74fe33f2e2b24d22bef195eeb3811c9c04022b0ab1a4bd1c044",
"v3/data/page1/btv1b84490444_page1_reshoot.tiff": "5dc5a8f0926a8d68740274ceb99304c7f2f0640d683f84bfbbfa19f34dc50b3676f4b6973e71426a364defe030c98fa271b89bb34370bedcbd718eba00566397",
"v3/data/page1/btv1b84490444_page1_reshoot.xmp": "c548b878e9babcb24676188715be10a0286173b7dc7c61f9fc9409d4ed98c0404e28913013f5b16fa15aba6b853ac8db54975760dd4663bc4272701453154f81",
"v1/metadata/mets.xml": "2ef53d9842fd6cc5410a42e2df560b65b8d9fcdedb28e99c08f81f98ece35d90232b3530d140b5a0242972092d8de5d7206032dc5e78e45ac9d00e461a85587d",
"v2/metadata/mets.xml": "2b10f07d828d76428d15fab1949e990e11f38663e65466cff2759ca10ab9c6411caf702152239045efb498041a0820d8d4a1aebdd6bbc0d4cb6c5468674741aa",
"v3/metadata/mets.xml": "5cc988a4f29c07229ed321dbf58e00de16ea36ea3c28df9db8f112dfca0ff0760ca4fb626d0a65300cf074107990c3ae704b5a3870f3c6ec7fc027633feb2773",
"v1/inventory.jsonld": "c2465c96dbbdecfac375997e949a494bf7b4786e7810c61bc994033ffd99c39d2920379f51e2c938012635e5e27cb32d1c98fda558f2aea1d4484b275fab2987",
"v2/inventory.jsonld": "694c9c48d09bf9a1a2cecdcd92b27f3d50b7d8bd28d2bb5d50291402d4230ca73843825281f8ac2057bbc7bd1ea34910a68f3d229415854a7f89d5946e89529f"
},
"versions": [
{
"type": "Version",
"id": "#v1",
"created": "2014-01-01T12:00:00Z",
"message": "Initial version",
"client": "OCFL Python Library 1.1.0",
"user": {
"name": "Andrew Hankinson",
"email": "andrew.hankinson@bodleian.ox.ac.uk"
},
"members": [
"a83e3633c880fb4caae55edd60571d930681496572f6ec6910b15a7796a38fcf0fff6aaf4acb8aac0f9a3049e42cf27bca0e2790105705d35d5727e1b9443f44", // "v1/data/page1/btv1b84490444_page1.jp2"
"2638e7c5bab6aadd2cb8eec5a48eefe35d25e005b55bd638e8ce222ef7204a790e64613ec63bb66e543c7e6d610391ebc3dd1b019b5ed028d942a1b00a0fb177", // "v1/data/page1/btv1b84490444_page1.tiff"
"e8eb20d68e2960cc77cf52976f665c368a5c5aba43af3e8cfa2bf5c56547087fa290f06b49a595cf6557c5d882e0ebbe16e2231771c659dc9d166f5320b40eb7", // "v1/data/page1/btv1b84490444_page1.xmp"
"de655a830d1c5d5c5cb6bf333d43f63fabf23a325d1a821c057b9ccf53e754c2e57e99e1753b1bcf071d1e6f9ced0067b7873fe2cb4dcd2d6741c27085e75f4f", // "v1/data/page2/btv1b84490444_page2.tiff"
"2ef53d9842fd6cc5410a42e2df560b65b8d9fcdedb28e99c08f81f98ece35d90232b3530d140b5a0242972092d8de5d7206032dc5e78e45ac9d00e461a85587d", // "v1/metadata/mets.xml"
]
},
{
"type": "Version",
"id": "#v2",
"created": "2014-01-01T13:00:00Z",
"message": "Added page 2 JPEG 2000 and XMP",
"client": "OCFL Python Library 1.1.0",
"user": {
"name": "Andrew Hankinson",
"email": "andrew.hankinson@bodleian.ox.ac.uk"
},
"members": [
"a83e3633c880fb4caae55edd60571d930681496572f6ec6910b15a7796a38fcf0fff6aaf4acb8aac0f9a3049e42cf27bca0e2790105705d35d5727e1b9443f44", // "v1/data/page1/btv1b84490444_page1.jp2"
"2638e7c5bab6aadd2cb8eec5a48eefe35d25e005b55bd638e8ce222ef7204a790e64613ec63bb66e543c7e6d610391ebc3dd1b019b5ed028d942a1b00a0fb177", // "v1/data/page1/btv1b84490444_page1.tiff"
"e8eb20d68e2960cc77cf52976f665c368a5c5aba43af3e8cfa2bf5c56547087fa290f06b49a595cf6557c5d882e0ebbe16e2231771c659dc9d166f5320b40eb7", // "v1/data/page1/btv1b84490444_page1.xmp"
"de655a830d1c5d5c5cb6bf333d43f63fabf23a325d1a821c057b9ccf53e754c2e57e99e1753b1bcf071d1e6f9ced0067b7873fe2cb4dcd2d6741c27085e75f4f", // "v1/data/page2/btv1b84490444_page2.tiff"
// new files added in version 2
"2b10f07d828d76428d15fab1949e990e11f38663e65466cff2759ca10ab9c6411caf702152239045efb498041a0820d8d4a1aebdd6bbc0d4cb6c5468674741aa", // "v2/metadata/mets.xml"
"83fe4cd4e6b711c29f0b9e16b0c7700c4c9946b6c61af261341efda4252d988cb3584d77f16e16138d012e6960c1c23467de4b8e3cfacda802e24920636212f6", // "v2/data/page2/btv1b84490444_page2.jp2",
"f3f965c408011fab1b32e31e1536f77ec6b8796945d3ec6414e6bf46320c21b1af05a43f98da161a382c29594c3abb2a4426299bc1340362f9fb1ad71c2100df" // "v2/data/page2/btv1b84490444_page2.xmp",
]
},
{
"type": "Version",
"id": "#v3",
"created": "2018-01-05:14:00:00Z",
"message": "Replaced page 1 with a re-shot version",
"client": "OCFL Ruby Gem 0.9.5",
"user": {
"name": "Andrew Woods",
"email": "awoods@duraspace.org"
},
"members": [
"fd3cfe0321074427d3907a2ff581d91112a831db84d0ef04be5dc0cb0628b01e2bfbd0f6375cc74fe33f2e2b24d22bef195eeb3811c9c04022b0ab1a4bd1c044", // "v3/data/page1/btv1b84490444_page1_reshoot.jp2",
"5dc5a8f0926a8d68740274ceb99304c7f2f0640d683f84bfbbfa19f34dc50b3676f4b6973e71426a364defe030c98fa271b89bb34370bedcbd718eba00566397", // "v3/data/page1/btv1b84490444_page1_reshoot.tiff",
"c548b878e9babcb24676188715be10a0286173b7dc7c61f9fc9409d4ed98c0404e28913013f5b16fa15aba6b853ac8db54975760dd4663bc4272701453154f81", // "v3/data/page1/btv1b84490444_page1_reshoot.xmp",
"de655a830d1c5d5c5cb6bf333d43f63fabf23a325d1a821c057b9ccf53e754c2e57e99e1753b1bcf071d1e6f9ced0067b7873fe2cb4dcd2d6741c27085e75f4f", // "v1/data/page2/btv1b84490444_page2.tiff"
"83fe4cd4e6b711c29f0b9e16b0c7700c4c9946b6c61af261341efda4252d988cb3584d77f16e16138d012e6960c1c23467de4b8e3cfacda802e24920636212f6", // "v2/data/page2/btv1b84490444_page2.jp2"
"f3f965c408011fab1b32e31e1536f77ec6b8796945d3ec6414e6bf46320c21b1af05a43f98da161a382c29594c3abb2a4426299bc1340362f9fb1ad71c2100df", // "v2/data/page2/btv1b84490444_page2.xmp"
"5cc988a4f29c07229ed321dbf58e00de16ea36ea3c28df9db8f112dfca0ff0760ca4fb626d0a65300cf074107990c3ae704b5a3870f3c6ec7fc027633feb2773", // "v3/metadata/mets.xml",
]
}
]
}
@zimeon
Copy link

zimeon commented Sep 6, 2018

Suggestion of additions (file actually in this version directory) and includes (files to be includes in this version, copied from this or other version directories):

        # Files in this version = additions + excludes, could remove leading "v5" in key
        "additions": {
            "v5/metadata/foo.xml": "sha512-6"
        },
        "includes": {
            "v5/data/blah.txt": "v1/data/blah.txt",
            "v5/data/nnn/ppp1.tiff": "v2/data/nnn/ppp1.tiff",
            "v5/metadata/foo-old.xml": "v2/metadata/foo.xml",       #rename
            "v5/metadata/empty.txt": "v2/metadata/empty.txt",
            "v5/metadata/copy-of-foo.xml": "v5/metadata/foo.xml"    #copy of file in this same version
        }

@zimeon
Copy link

zimeon commented Sep 6, 2018

Nice properties of the above:

  • forward delta version optional when creating new version (use only additions, no includes for a full copy)
  • files in v# dir == additions, just run through these to check content
  • files in this version == set of keys in additions and includes

@zimeon
Copy link

zimeon commented Sep 6, 2018

Bad property of the above:

  • Use of path in value in includes array is an unwieldy, might be better to use the sha512 value which then has to be looked up in the additions data of the history of version

Question:

  • Is it better to use sha512 as the keys rather than values if JSON libraries are likely to better handle big/odd values than key?

@neilsjefferies
Copy link

        "sources": {
            "sha256-1": "v1/data/blah.txt" #inherited from V1
            "sha256-2": "v2/data/nnn/ppp1.tiff"
            "sha256-3": "v5/metadata/foo.xml" #foo and foo-copy are the same and new this time
            "sha256-4": "v4/metadata/foo.xml" #foo-old in v5 was foo in v4
            "sha256-5": "v2/metadata/empty.txt"
        },
        "state": {
            "data/blah.txt": "sha512-1",
            "data/nnn/ppp1.tiff": "sha512-2",
            "metadata/foo.xml": "sha512-3",
            "metadata/foo-old.xml": "sha512-4", #renamed foo from v4
            "metadata/foo-copy.xml": "sha512-3", #foo-copy is the same as foo
            "metadata/empty.txt": "sha512-5"

}

@zimeon
Copy link

zimeon commented Sep 6, 2018

Splitting sources into additions (new files in this version) and includes (files included from old versions)

     "additions": {
        "sha256-3": "v5/metadata/foo.xml" #foo and foo-copy are the same and new this time
     },
     "includes": {
        "sha256-1": "v1/data/blah.txt" #inherited from V1
        "sha256-2": "v2/data/nnn/ppp1.tiff"
        "sha256-4": "v4/metadata/foo.xml" #foo-old in v5 was foo in v4
        "sha256-5": "v2/metadata/empty.txt"
      },
      "state": {
         "data/blah.txt": "sha512-1",
         "data/nnn/ppp1.tiff": "sha512-2",
         "metadata/foo.xml": "sha512-3",
         "metadata/foo-old.xml": "sha512-4", #renamed foo from v4
         "metadata/foo-copy.xml": "sha512-3", #foo-copy is the same as foo
         "metadata/empty.txt": "sha512-5"
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment