no-reply/thoughts.md

## thoughts.md

      
    Raw
  

              thoughts.md
            
          
    Records, Documents, & Graphs

Accounting for record scope & mutability in metadata management.


Smoothies cannot be edited
@anarchivist -- 6:52 PM PDT - 23 Apr 2015

Questions

The key question I'm setting out to answer is: How can we account for routine change and updates in our metadata records.  An initial attempt to derive a model for change from current practice has led to some corollary questions about the relationship between Records, Documents, Description Sets, Application Profiles, Resources, and RDF Sources^{lit review}:

What is a Record?

Possible definition: Records are Documents instantiating a Description Set
Are Records mutable? What about Description Sets?
What other views of metadata records need to be taken into account?


Is a Description Set in DCAM equivalent to a (or a kind of) Graph in RDF?^{property value}
What is the relationship between a Description Set (and by extension, a Record) and a Resource?

If we speak of a "record for Moby Dick", how do we distinguish that from a "record for Melville" that happens to contain some statements about Moby Dick? Is this a valid distinction under DCAM?


Is a Description Set an example of an RDF Source?

Records and Documents

The Dublin Core Abstract Model (DCAM) defines a metadata record as a document that instantiates a Description Set^record.  Description Sets, in turn, are defined as sets of (one or more) Descriptions, with each Description defined as a set of one or more Statements "about one, and only one, resource".
Functional Requirements for Bibliographic Records (FRBR) treats "Records" as an aggregation of "descriptive elements" and "filing devices" (IFLA, 1997; see especially Sec. 2.2).  It's not clear from the definition given (or a loose reading of the remainder of the document) whether the IFLA Study Group's view is of a record as an abstract entity that can be updated, as a static representation of data, or as a literal physical document.  While some combination appears to be at play, there seems to be an emphasis on the last.
The issue of Record mutability in both understandings raises the issues documented in Documents Cannot be Edited (Renear & Wickett, 2009).  There is no model for revision in place and Description Sets lack identifiers of their own to pin revisions on. Records often likewise. Even taking a casual view of Records as physical documents, there seems to be little option but to view "revisions" as new documents which will be filed in roughly equivalent places to their predecessors in a card catalog or similar.
In the case of DCAM, the problem is compounded, since Description Sets are defined as sets (sets of sets of statements).  This keeps the model close to that of RDF, but leaves the idea of a persistent, changeable Record out of the picture.
Mutability as a Requirement for Actionable Records

The view of Records implied by the above leaves us with significant problems for even basic metadata and asset management workflows. Our practice when describing a resource is to assume that new (and deleted) assertions update an old description. Our systems manage this with internal representations of state, controlled with database rows, or object representations, or otherwise; but usually without an articulated formal model. This won't do when we introduce Linked Data (or any large scale interoperability scheme). A shared model for mutability is needed.
[I would like to further document/articulate the nature of this requirement! What would we be lacking if we always saw records as static?]
Reviewing the RDF Model


Resources
Statements
Graphs
Datasets
RDF Source

Graphs are Immutable

Graphs are sets of statements.
RDF Source


We informally use the term RDF source to refer to a persistent yet mutable source or container of RDF graphs. An RDF source is a resource that may be said to have a state that can change over time. A snapshot of the state can be expressed as an RDF graph. For example, any web document that has an RDF-bearing representation may be considered an RDF source. Like all resources, RDF sources may be named with IRIs and therefore described in other RDF graphs.

RDF and Change over Time. RDF Concepts and Abstract Syntax.


As Resources, Sources can be denoted by an IRI or existentially quantified as a blank node.  Further, a Source may be said to relate a time sequence of zero or more RDF graphs, with each graph representing a state of the mutable Resource at a given time.
Revisiting DCAM

A description is a set of statements that follow the one-to-one principle over the set. In explicit RDF terms, that is, a Graph whose triples share a single Resource as their subject node. On its face, this is very similar to the kind of "resource view" common on Linked Data publishing platforms that expose the triples "about" a given Resource.  In practice, a description adds notions of constraint and completeness either through Description Templates (and Statement Templates) in a Description Set Profile or through less formal guidelines for vocabulary usage commonly included in Application Profiles.
The larger Description Set and its associated Record instantiations are, similarly, Graphs without the subject restriction. Any Graph can arguably be interpreted as a Description Set containing Descriptions for each of the Resources that appear as subjects in its triples; though there may be value in the view that a Graph is only a Description Set when viewed in the context of some set of constraints, or as a candidate expression of a "Profile" or "Shape"^infinite .
Some Gaps


While a Record is said to instantiate a single Description Set, DCAM provides no mechanism for determining which Description Set is instantiated.

This points to an interpretation of Description Set as equivalent to Graph---both are defined as sets of statements, without the trappings that come with being a representation of a given Resource.
If this is the case then a Record instantiates a given Description Set merely by faithfully encoding the statements that make it up. This leaves no support for notions like "each metadata record is to represent exactly one book" as found in Sec. 6 of Guidelines for Dublin Core Application Profiles (Coyle & Baker, 2008).


Constraints and completeness are similarly problematic, since a single Record may be valid and complete for one profile, but not another.
...

RDF Source

The RDF Source concept offers a potential solution for each of these problems.
Towards a Formalized Model for RDF Sources

While a common pattern (alluded to in RDF and Change Over Time) is to dereference the Source's IRI to get the current state of the Resource, it's not explicitly required that the representation express the current state. Nor is it necessary to retain each graph in the sequence, or that continuity be maintained.
Linked Data Platform codifies more specific patterns of dereferencability, including a requirement of fullness of the representation, and methods for updating "current persistent state".  I've done some work to formalize similar handling of locally managed state-bearing Graphs in ActiveTriples in a comment on the GitHub issue "Resource-centric vs graph-centric in persistence/querying".
Removing the implementation specific language and restrictions:

An RDF Source is "a resource that may be said to have a state that can change over time". Therefore, it:

is a Resource
may be the referent of a URI.


An RDF Source has a Graph container.

A container is a mechanism for retrieving specific Graphs; a container may be, e.g.

a dereferencable URI (web address); or
a named graph; or
a language construct (an Object, or a Variable); or
a document; or
a memory block; etc...


The Graph in the container represents the Source's current state.


Problems for Provenance


Notes

[lit-review]: Literature review is still on-going, but I believe I've pulled in the relevant concepts. Some fashion of definition of each concept listed is attempted somewhere the main text.
[implementations]: While in LDP and ActiveTriples, the current state is represented by a specific Graph, in principle it's only necessary that some snapshots of state may be represented by Graphs.
[property value]: While working through this question, it has occurred to me that JSON-LD represents another example of this issue. Its graphs are expressed in documents as property value pairs in a model very similar to DCAM.
[record]: Specifically, it says a record is"An instantiation of a description set, created according to one of the DCMI encoding guidelines (for example, XHTML meta tags, XML and RDF/XML)." The tie to an encoding is significant, since it ensures that a record expresses at most one Graph.
[infinite]: Consider, for example the Graph of the web. It's not clear what use there is in viewing this as a Set with a functionally infinite number of Descriptions or why anyone would want to instantiate such a thing as a Record.

Bibliography


Libraries, Languages of Description, and Linked Data. Baker. 2011
Establishing Trust in Data Integration Projects. Origins. 2015
Documents Cannot Be Edited. Renear & Wickett. 2009.
Description Set Profiles. DCMI, Nilsson. 2009
Dublin Core Abstract Model.  DCMI. 2007
Formalizing Dublin Core Application Profiles in Metadata & Semantics. Nilsson. 2009
Guidelines for Dublin Core Application Profiles. Coyle & Baker. 2008
Functional Requirements for Bibliographic Records. IFLA, 1997 (amendments through 2009).