Last active
April 23, 2019 00:14
-
-
Save seanupton/6983fc74e002ea8b3160f6687b8633e3 to your computer and use it in GitHub Desktop.
File CRUD+Attachment components, proposed calling semantics
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# WorkFiles is all of the following: | |
# - Adapter of Work | |
# - A mapping of existing saved files, expressed as a hash-like | |
# object, with identifier keys, and path-to-file values, with | |
# transparent just-in-time working-copy checkout of values on | |
# request from backing storage to local file. | |
# - Identifiers are presumed to be (PCDM) FileSet persistence ID, | |
# not persistence id of a file. | |
# - Provide a means to get a file path (working copy), either by | |
# global identifier or by "file name" relative to the local work | |
# context (the latter requires metadata traversal, and may be slower). | |
# - An assign-and-commit attachment system with the semantics of | |
# a simple state machine, with states: | |
# - 'flushed': commit! has completed core attach ment operations; | |
# this has no bearing on status of other asychronous progressive | |
# enhancements, such as characterization or derivatives. | |
# - 'pending': commit! operation has been started. | |
# - 'dirty': File(s) have been assigned for attachment, but not | |
# committed, transitions to 'pending' on start of commit!. | |
# - Stored: | |
# - A quick way to access similar funtionality for derivatives, | |
# via transparent construction of a WorkDerivatives adapter that | |
# has calling semantics similar to WorkFiles. | |
# - A means to assign both primary and derivative files with a single | |
# commit transaction. | |
# - Implementation: built to use default Hyrax actors and jobs upon | |
# commit! with special callbacks to handle derivatives and the | |
# transition of stored state . | |
require 'work_files' | |
# WorkFiles is an adapter of work, that provides CRUD for files, and access | |
# to perform CRUD operations on each file's derivatives. | |
# Alternative constructor .of as alternate spelling for .new | |
files = WorkFiles::WorkFiles.of(work) | |
# kinds of sources to deal with (IO objects are not in scope): | |
# - Things with local path | |
files.assign('path/to/my.tiff') | |
# - Things with remote URI | |
files.assign('http://example.com/assets/lincoln-memorial.tiff') | |
# - Things with local URI | |
files.assign('/mnt/nfs/1a/go/to/my.tiff') | |
# Path-like, e.g. Pathname (should be normalized to String by adapter to store) | |
files.assign(Pathname.new('path/to/my.tiff')) | |
# Note: IO objects are out of scope, they would just be staged to tempfiles, | |
# which seems like more complexity than needed, more to test, and for | |
# marginal benefit, if any, to callers. | |
# enumerating assigned (queued) files, does not include flushed/committed | |
files.assigned #=> Array of path and/or URI Strings | |
# calling commit! will attempt a save, and querying state of WorkFiles | |
# objects reflects that. Determining whether state is flushed does | |
# require a query to back-end store to determine if the known assigned | |
# file is actually flushed and matches expected local state. | |
files.state #=> 'dirty' | |
files.commit! | |
files.state #=> 'pending', or later 'flushed' | |
# enumerating already stored files for work | |
files.keys #=> identifiers (re: PCDM, this is FileSet id, not file!) | |
files.names #=> local "file names", presumably local to work | |
# Getting a WorkFile (also an adapter/wrapper) object, requires checkout | |
# of working copy from storage, which may be slower operation: | |
file = files.get(name_or_id) | |
# -- is the same as: -- | |
file = files[name_or_id] | |
# A WorkFile is simply a wrapper for file that is checked out for local | |
# use from storage. | |
file.parent #=> WorkFiles | |
file.work #=> Work object parent of file(s) | |
file.fileset #=> PCDM fileset containing file | |
file.derivatives #=> WorkDerivatives for work | |
# Getting underlying ActiveFedora File object: | |
file.unwrap #=> Primary persistence object for file | |
# Just-in-time working copy checkout: | |
# We need a working copy upon accessing file stream content, path | |
# or metadata | |
# Before this checkout (upon access of any method below), will the | |
# WorkFile be something other than a ghost. | |
# Access to binary payload: | |
file.path #=> path to file, implies it is checked out, at latest upon return | |
file.with_io | |
file.data | |
# metadata provided by ActiveFedora::WithMetadata mixin to File objects: | |
file.name #=> equiv to original filename | |
file.size #=> file size | |
file.date_created #=> | |
file.date_modified | |
file.mime_type #=> mime-type | |
# WorkFile is not responsible for saving but WorkFiles is: | |
file.commit! #=> RAISES NoMethodError | |
file.parent.commit! # commits all files assigned to WorkFiles parent | |
# Gettting derivatives of a specific WorkFile: | |
derivatives = files.get(name_or_id).derivatives | |
# -- or -- | |
derivatives = file.derivatives | |
# For single-file works, one need not get the file by name or id: | |
derivatives = files.derivatives #=> WorkDerivatives, with inferred parent | |
# For multi-file works, we run into issues of uniqueness for derivatives, | |
# so we treat as unsupported: | |
multiple_workfiles.derivatives #=> RAISE EXCEPTION | |
# WorkDerivatives objects have a reference back to their parent WorkFile | |
derivatives.parent #=> WorkFile instance | |
derivatives.work_files #=> WorkFiles instance for work | |
derivatives.work #=> work object | |
# ## Loading/attaching pre-made derivatives: | |
# | |
# 'derivatives' here is WorkDerivatives adapter (of specific fileset) | |
# and have similar assignment semantics to WorkFiles; here the | |
# destination name of the derivative is inferred from extension: | |
derivatives.assign('path/to/my-alto.xml') | |
# assigning derivatives can opt to name the destination name: | |
derivatives.assign('path/to/my-jp2-gray.jp2', 'jp2') | |
# Commit for derivatives can be done independently, if desired: | |
derivatives.state # => 'dirty' (something is assigned) | |
derivatives.commit! # commits, has own state | |
derivatives.state # => 'pending', or 'flushed' | |
# ...or as part of commit of any parent WorkFiles: | |
derivatives.assign('this/that/123.pdf') | |
derivatives.state # => 'dirty' (something is assigned) | |
derivatives.work_files.commit! | |
derivatives.state # => 'pending', or 'flushed' | |
# WorkDerivatives is also mapping/hash-like... | |
# Derivative keys for read/enumeration/get are destination names; | |
# Derivative values are Hyrax (pairtree) path to stored file. | |
derivatives.keys #=> Array of destination names | |
derivatives.values #=> Array of pairtree path | |
derivatives.get('jp2') #=> String path to jp2 file | |
# Derivative assignment for a destination name may replace previous | |
# derivative for same destination name: | |
derivatives.assign('path/to/my-jp2-color.jp2', 'jp2') | |
derivatives.commit! # commits, has own state/ | |
# ## Deleting stuff: | |
# Deleting a file from WorkFiles: | |
# -- ideally, this happens as quickly as possible, but it may be case | |
# that Solr index is updated asynchronously. | |
files.unassign(name_or_id) | |
files.commit! # commits unassign operations before assign | |
# Deleting a derivative: | |
derivatives.unassign('jp2') # unassign by destination name | |
derivatives.commit! # commits unassign operations first, before | |
# assign operations. | |
# Replacing/updating an attached primary file | |
# We always create a new fileset for each assigned primary file | |
# therefore replacing an existing primary file is a matter of assigning | |
# a new file (=> new file set), and removing the old (file, file set). | |
# These operations can be performed in any order, and the eventual state | |
# (presuming some async work) will be the same, regardless. | |
# Get an file identifier (or name) to remove: | |
name_or_id = files.keys[0] # or files.names[0], whatever | |
# Out with the old: | |
files.unassign(name_or_id) | |
# In with the new: | |
files.assign('/mnt/nfs/1a/go/to/my_other.tiff') | |
# Make both of the above happen; in the end, order does not matter: | |
adapter.commit! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment