Skip to content

Instantly share code, notes, and snippets.

@seanupton
Last active April 23, 2019 00:14
Show Gist options
  • Save seanupton/6983fc74e002ea8b3160f6687b8633e3 to your computer and use it in GitHub Desktop.
Save seanupton/6983fc74e002ea8b3160f6687b8633e3 to your computer and use it in GitHub Desktop.
File CRUD+Attachment components, proposed calling semantics
# WorkFiles is all of the following:
# - Adapter of Work
# - A mapping of existing saved files, expressed as a hash-like
# object, with identifier keys, and path-to-file values, with
# transparent just-in-time working-copy checkout of values on
# request from backing storage to local file.
# - Identifiers are presumed to be (PCDM) FileSet persistence ID,
# not persistence id of a file.
# - Provide a means to get a file path (working copy), either by
# global identifier or by "file name" relative to the local work
# context (the latter requires metadata traversal, and may be slower).
# - An assign-and-commit attachment system with the semantics of
# a simple state machine, with states:
# - 'flushed': commit! has completed core attach ment operations;
# this has no bearing on status of other asychronous progressive
# enhancements, such as characterization or derivatives.
# - 'pending': commit! operation has been started.
# - 'dirty': File(s) have been assigned for attachment, but not
# committed, transitions to 'pending' on start of commit!.
# - Stored:
# - A quick way to access similar funtionality for derivatives,
# via transparent construction of a WorkDerivatives adapter that
# has calling semantics similar to WorkFiles.
# - A means to assign both primary and derivative files with a single
# commit transaction.
# - Implementation: built to use default Hyrax actors and jobs upon
# commit! with special callbacks to handle derivatives and the
# transition of stored state .
require 'work_files'
# WorkFiles is an adapter of work, that provides CRUD for files, and access
# to perform CRUD operations on each file's derivatives.
# Alternative constructor .of as alternate spelling for .new
files = WorkFiles::WorkFiles.of(work)
# kinds of sources to deal with (IO objects are not in scope):
# - Things with local path
files.assign('path/to/my.tiff')
# - Things with remote URI
files.assign('http://example.com/assets/lincoln-memorial.tiff')
# - Things with local URI
files.assign('/mnt/nfs/1a/go/to/my.tiff')
# Path-like, e.g. Pathname (should be normalized to String by adapter to store)
files.assign(Pathname.new('path/to/my.tiff'))
# Note: IO objects are out of scope, they would just be staged to tempfiles,
# which seems like more complexity than needed, more to test, and for
# marginal benefit, if any, to callers.
# enumerating assigned (queued) files, does not include flushed/committed
files.assigned #=> Array of path and/or URI Strings
# calling commit! will attempt a save, and querying state of WorkFiles
# objects reflects that. Determining whether state is flushed does
# require a query to back-end store to determine if the known assigned
# file is actually flushed and matches expected local state.
files.state #=> 'dirty'
files.commit!
files.state #=> 'pending', or later 'flushed'
# enumerating already stored files for work
files.keys #=> identifiers (re: PCDM, this is FileSet id, not file!)
files.names #=> local "file names", presumably local to work
# Getting a WorkFile (also an adapter/wrapper) object, requires checkout
# of working copy from storage, which may be slower operation:
file = files.get(name_or_id)
# -- is the same as: --
file = files[name_or_id]
# A WorkFile is simply a wrapper for file that is checked out for local
# use from storage.
file.parent #=> WorkFiles
file.work #=> Work object parent of file(s)
file.fileset #=> PCDM fileset containing file
file.derivatives #=> WorkDerivatives for work
# Getting underlying ActiveFedora File object:
file.unwrap #=> Primary persistence object for file
# Just-in-time working copy checkout:
# We need a working copy upon accessing file stream content, path
# or metadata
# Before this checkout (upon access of any method below), will the
# WorkFile be something other than a ghost.
# Access to binary payload:
file.path #=> path to file, implies it is checked out, at latest upon return
file.with_io
file.data
# metadata provided by ActiveFedora::WithMetadata mixin to File objects:
file.name #=> equiv to original filename
file.size #=> file size
file.date_created #=>
file.date_modified
file.mime_type #=> mime-type
# WorkFile is not responsible for saving but WorkFiles is:
file.commit! #=> RAISES NoMethodError
file.parent.commit! # commits all files assigned to WorkFiles parent
# Gettting derivatives of a specific WorkFile:
derivatives = files.get(name_or_id).derivatives
# -- or --
derivatives = file.derivatives
# For single-file works, one need not get the file by name or id:
derivatives = files.derivatives #=> WorkDerivatives, with inferred parent
# For multi-file works, we run into issues of uniqueness for derivatives,
# so we treat as unsupported:
multiple_workfiles.derivatives #=> RAISE EXCEPTION
# WorkDerivatives objects have a reference back to their parent WorkFile
derivatives.parent #=> WorkFile instance
derivatives.work_files #=> WorkFiles instance for work
derivatives.work #=> work object
# ## Loading/attaching pre-made derivatives:
#
# 'derivatives' here is WorkDerivatives adapter (of specific fileset)
# and have similar assignment semantics to WorkFiles; here the
# destination name of the derivative is inferred from extension:
derivatives.assign('path/to/my-alto.xml')
# assigning derivatives can opt to name the destination name:
derivatives.assign('path/to/my-jp2-gray.jp2', 'jp2')
# Commit for derivatives can be done independently, if desired:
derivatives.state # => 'dirty' (something is assigned)
derivatives.commit! # commits, has own state
derivatives.state # => 'pending', or 'flushed'
# ...or as part of commit of any parent WorkFiles:
derivatives.assign('this/that/123.pdf')
derivatives.state # => 'dirty' (something is assigned)
derivatives.work_files.commit!
derivatives.state # => 'pending', or 'flushed'
# WorkDerivatives is also mapping/hash-like...
# Derivative keys for read/enumeration/get are destination names;
# Derivative values are Hyrax (pairtree) path to stored file.
derivatives.keys #=> Array of destination names
derivatives.values #=> Array of pairtree path
derivatives.get('jp2') #=> String path to jp2 file
# Derivative assignment for a destination name may replace previous
# derivative for same destination name:
derivatives.assign('path/to/my-jp2-color.jp2', 'jp2')
derivatives.commit! # commits, has own state/
# ## Deleting stuff:
# Deleting a file from WorkFiles:
# -- ideally, this happens as quickly as possible, but it may be case
# that Solr index is updated asynchronously.
files.unassign(name_or_id)
files.commit! # commits unassign operations before assign
# Deleting a derivative:
derivatives.unassign('jp2') # unassign by destination name
derivatives.commit! # commits unassign operations first, before
# assign operations.
# Replacing/updating an attached primary file
# We always create a new fileset for each assigned primary file
# therefore replacing an existing primary file is a matter of assigning
# a new file (=> new file set), and removing the old (file, file set).
# These operations can be performed in any order, and the eventual state
# (presuming some async work) will be the same, regardless.
# Get an file identifier (or name) to remove:
name_or_id = files.keys[0] # or files.names[0], whatever
# Out with the old:
files.unassign(name_or_id)
# In with the new:
files.assign('/mnt/nfs/1a/go/to/my_other.tiff')
# Make both of the above happen; in the end, order does not matter:
adapter.commit!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment