Comments on Records in Contexts: A Conceptual Model for Archival Description
Consultation Draft v0.1 September 2016
Date: 30 January 2017.
By: Ross Spencer < all.along.the.watchtower2001 [at] gmail.com >
I'm a digital preservation expert working at Archives New Zealand. This short response to the consultation draft is submitted independently of my organisation.
I have worked previously at The National Archives, UK. I have a keen interest in supporting archivists and end-users to make full use of the collections that we are custodians of.
I believe I join in the good majority of the community in expressing my gratitude about being given the opportunity to comment on the consultation draft.
I am in favour of the approach taken by ICA to combine multiple standards into a single new standard (p1). I think the work done on this draft is phenomenal, and despite very specific comments below about features of the modelling work thus far, this is a standard that I think paves the way to a bright future for archival description and discovery.
Whatever way the standard evolves, it is one I hope to be using in the near future.
I am in favour of an approach that embraces the techniques of linked open data (LoD) (p2).
A clearer delineation/description of the differences between RiC-CM and RiC-O would be beneficial and may help resolve other concerns noted below, e.g. expansion of controlled-vocabulary terms (p1).
Comments establishing the standard within the LoD/semantic web ecosystem would be appreciated. The comments should survey the semantic web landscape and discuss complimentary standards that are recommended by ICA to be used alongside any future RiC model.
The standard is very wide ranging. In its early stages I would consider it to be too broad. I ask the ICA to consider a more restricted version of this standard that is more concise. Comparable to other LoD standards such as Dublin Core (15 elements), SKOS (32) vs. RiC-CM ~800 (?).
A restricted set could focus on features of the vocabulary that are absolutely necessary, and can be used by the widest possible audience to support management and discovery of archives.
A restricted set could be monitored for use and iterations built upon henceforth.
It is noted that the 'relations' described in the paper are suggestive. When 'rounded out' it should also be noted that using LoD techniques 'relations' become 'resources' in their own right (the predicates in the subject, predicate, object, triumvirate). That makes them something the user will look up more information about. As such the data they contain should be as complete as the entites themselves.
Because of this then, additional consideration should be given to point 05. I raise above, where I call for an initial, more concise vocabulary to be considered. There is a maintenance overhead of a large vocabulary that (the lack of description in v0.1) may be an indication of an existing reality.
A vocabulary that is too broad could have a dilutant quality impacting discovery, whereby, a wide variance of terms are used to describe too large a number of records creating smaller results sets when using techniques such as faceted search. (Posit, a smaller number of properties across a larger number of records creates larger results sets)
I appreciate the inclusion of an authenticity and integrity note (RiC-P5). I would like to see this expanded further for digital with a separate field, or set of fields that have a specific data-type of 'checksum' i.e. a field that can be validated as being just a checksum only.
A checksum is a mechanism by which a digital file in 'a' digital repository can be reliably paired with a catalogue entry. By having an explicit mechanism for attaching checksum or sets of checksum to the catalogue it promotes computerization of processes between the two.
On that note, I would like to see the LoD concepts committed to more fully. Where there are facts to be recorded - a checksum being a fact about a digital record's current state - more rigid properties can be created and used.
RiC-P39 (Contact Information) is an example of a property that can be expanded into 'facts'. Email address, postal address, phone number. All properties that can be validated in some way, and that might be desirable to be searched upon in some way. Conditions of use, where licenses could be searched upon, may be another useful example.
RiC-P6 (Content Type) will be set via controlled list. Controlled vocabularies such as in this example, where not otherwise specified (e.g. as in MIME) should be described fully by this standard as resources that can also be looked-up and de-referenced to provide more information.
To promote interoperability, data types should be specified more fully e.g. preferred/expected number, text, or date formats, plus strategies to resolve areas of ambiguity, such as for dates, where precision may not always be possible.
RiC-P10 (Encoding Format) is a good example of a field explicitly made available for digital. Properties such as RDFS:Domain may become important as an ontology develops out of this work. What is the ICA's chosen approach to managing the lines between paper and digital where properties may or may not make sense to one record type over the other?
The consultation draft makes adequate attempts to caveat its work in places, including:
"It is essential that developers of records management and record description and access systems are part of RiC’s audience. RiC is detailed and complex, and therefore successful implementation and use will require the development of methods that will ameliorate the intellectual, technological, and economic challenge of data creation and maintenance."
It is possible therefore that some of my suggestions above have been considered to be out of scope currently; or simply not relevant to the future goals of this standard.
It is also likely that for realistic questions raised, they will be answered during the proving stages of Records in Contexts where I hope to be an active participant in working with the new standard.
Thank you once again for the opportunity to contribute thus far.