stain/iswc2017-review-314.md

## iswc2017-review-314.md

      
    Raw
  

              iswc2017-review-314.md
            
          
    Review: ISWC 2017 Resources Submission 314


URI: https://gist.github.com/stain/3c0c16a4c27caeca3a0c204052cd5d14
Title: Towards an Ontology for Describing Archival Resources
Authors: Laura Pandolfo, Luca Pulina and Marek Zielinski
Call: ISWC 2017 Resources Track
Submitted preprint: TODO
Resource: http://purl.org/arkivo http://85.33.50.88/ontology/arkivo.owl https://github.com/ArkivoTeam/ARKIVO
Review by: Stian Soiland-Reyes (#2 of 4)
Outcome: Reject for ISWC 2017 Resources Track

Accepted at: Workshop on Humanities in the Semantic Web (WHiSe II) (proceedings to appear at http://ceur-ws.org/)


This review is licensed under a Creative Commons Attribution 4.0 International License.
Evaluation

Overall evaluation: 1: Weak Accept
Reviewer's confidence: 3: high
Appropriateness: 1: good
Clarity and quality of writing: 1: good
Related work: 1: good
Originality: 0: sufficient
Impact of ideas and results: 1: good
Implementaton and soundness: **2: very good **
Evaluation: 1: good
Assessment of the resource

Assessment of the resource: 1: good
I evaluated the ontology by loading it in Protege 5.2.0 and inspected the classes and properties after running the HerMit reasoner.
The website on http://arkivo-ontology.org/ or the OWL file do not declare any open source license, so from the look of it this ontology is not freely reusable.
Edit: The authors have added the license info to both OWL and HTML as Creative Commons Attribution 3.0 Unported License.  (why not 4.0?)
Reusability

Reusability: -1: poor
The ontology web page has brief auto-generated documentation showing label/description from the OWL file. The main documentation is in this paper, which is not currently linked to from the ontology page; so it is hard for an outsider to assess what is the purpose of ontology or how it should be used.
The paper does not detail how the ontology has been used, except that it facilitates an ontology-based system currently under development. Therefore I would expect actual usage to cause later improvements and changes to the ontology. The lack of any versioning indication in the ontology means it would not be advisable for third-parties to rely on it as it is today.
Edit: The authors have added <owl:versionInfo rdf:datatype="http://www.w3.org/2001/XMLSchema#double">0.2</owl:versionInfo> - however 0.2 should be a xsd:string and not a floating point number.
The paper cites http://purl.org/arkivo however the namespace used inside the OWL is http://www.arkivo-ontology.org/ontology# -- this can be confusing to downstream users. Would not the permalink http://purl.org/arkivo make sense also as the namespace and ontology IRI rather than a new domain name that might expire on 2017-12-15? (DNS info from WHOIS)
I would also recommend that the namespace URI resolve directly to the OWL file, preferably through Content-Negotiation, so that it can also be used with OWL Import statements. You would need a second permanent URI for the documentation/front web pages if you are unable to do content negotiation (e.g. GitHub Pages do not support content negotation)
Edit: The authors changed ontology to use namespace http://purl.org/arkivo/ontology# (which resolves to http://85.33.50.88/ontology/arkivo.owl#) and http://purl.org/arkivo which is the HTML homepage at http://85.33.50.88/
The ontology file do not contain much metadata, I would expect as a minimum: owl:versionInfo, dc:creator, dcterms:issued, dcterms:license and owl:versionIRI - last one should be a version-specific IRI that downloads directly (owl importable) that particular ontology, e.g. at the corresponding GitHub Release download or GitHub Tag using cdn.rawgit.com
Edit: Authors have added all the requested metadata. (see issue on owl:versionIRI as double).
From the website there is no obvious feedback or contribution mechanism (except for email to authors). It would make sense to add links to https://github.com/ArkivoTeam/ARKIVO for the source code, issue tracker and pull request mechanisms.  There is no link (except for the namespace) back from the OWL file, I would expect this as an rdfs:seeAlso annotation within the OWL.
Edit: Authors added a rdfs:seeAlso to https://github.com/ArkivoTeam/ARKIVO, but it should be given as an object property rather than a literal.  Authors didn't add GitHub link to the HTML.
Loading issue

http://purl.org/arkivo redirects to http://arkivo-ontology.org/ which presents a HTML Frameset with the real content from http://85.33.50.88/ -- this convoluted setup discourages deep linking (e.g. to the documentation) as the browser on the top always says http://arkivo-ontology.org/
This also means it is very difficult to download, import or reuse the ontology, e.g. from Protege, as the link "The ARKIVO Ontology (OWL file)" goes to http://www.arkivo-ontology.org/ontology/arkivo.owl which again gives a HTML Frameset that wraps the RDF/XML from http://85.33.50.88/ontology/arkivo.owl -- trying to do "File Save" or "Save Link As" will similarly save the HTML page, which obviously Protege does not understand.
So to import the ontology in Protege you have to use View Source tricks like I did above to find the IP address URL, or copy-paste the XML and save into a file.
Edit: Authors fixed the PURLs and the new namespace http://purl.org/arkivo/ontology redirects to the OWL file.
The server 85.33.50.88 is host88-50-static.33-85-b.business.telecomitalia.it hinting at a home or small office internet connection.  Is this stable and sustainable hosting of the ontology for archiving?
Edit: Authors confirm this is owned by University of Sassari/Porto Conte Ricerche, and that they will fix the Apache configuration.
I recommend to host the ontology file and website on a maintained server for long-term accessibility - and then update DNS to use that directly without the frameset wrapping.
Edit: Authors have changed http://www.arkivo-ontology.org to redirect rather than HTML frame to http://85.33.50.88/ - DNS should still be changed to map directly to 85.33.50.88 in its A record.
Edit: Authors have improved most of my Reusability concerns on the deployment, but have not linked to GitHub or provided DOI. Permalinks are now working.  I change my Reusability score from "-1: poor" to "0: sufficient"
Resource Design Quality

Resource Design Quality: 1: good
The ontology have clear and good reuse of concepts from existing ontologies Bibo, schema.org, foaf, geonames. It is organized cleanly and easy to navigate.
All concepts have label and comment, as well as rdfs:isDefinedBy citations for all imported concepts. This "Soft reuse" by citation (rather than owl:imports) is a clean approach that I already argued for in https://lists.w3.org/Archives/Public/public-lod/2017Jan/0045.html - good to see it in practice!
However some of this reuse is intrusive - for instance dcterms:alternative has been modified to have domain arkivo:Item which says "An item is the smallest intellectually indivisible archival unit" -- but dcterms:alternative is widely used elsewhere with resources that are not "archival units".
Similarly bibo:Collection has been made a subclass of arkivo:CreativeThing, which is "thing created or named by people", but bibo:Collections may also be made by software agents, and are not required to be be named. However this particular relation is done more through equivalence on CreativeThing and so would require reasoning to affect other bibo use.
arkivo:NamedThing is "things that can be listed or mentioned in Creative Thing". Is there anything that can NOT be mentioned in a Creative Thing? This seems to me to cover anything we are able to conceptualize. In particular, if something is described in an RDF file, which is a creative thing, then that something is by definition a "NamedThing".  So to me this class seems pointless. It's equivalence definition is however too narrow, "Place or Date or Agent" -- as I argue here it's possible to mention any other thing in the world - in particular it is likely that a Magazine could mention a Book, or a Document could mention a HistoricalEvent.
schema:mentions is hijacked to have a domain of arkivo:CreativeThing and range arkivo:NamedThing - its definition in http://schema.org/CreativeWork goes from http://schema.org/CreativeWork to the super-generic http://schema.org/Thing (schema's non-OWL definition of owl:Thing).
To me it seems like schema:mentions and schema:CreativeWork can be used directly without needing any modifications from upstream, and arkivo:NamedThing can be dropped.
(This is a common problem in trying to lock down ontology reuse patterns in OWL rules, you probably intend that someone using this ontology should be using schema:mentions on objects of your classes, not that any use of schema:mentions should infer your classes. )
The ontology reuses bibo:Book, bibo:Document etc, but introduces arkivo:Image - is this different from dcmitype:Image, foaf:Image or (if digital) schema:ImageObject ?   If a special new class is needed for a more narrow "representation of the form of a person or object", perhaps arkivo:Depiction subclass of foaf:Image which can then have foaf:depicts as required property.
I wondered at why arkivo:CreativeThing - which at first I considered equivalent to http://schema.org/CreativeWork - covers arkivo:HistoricalEvent as an inferred subclass. I assume this is because notable historical events are at some point "named by people", unlike say the 'event' of going to the toilet; yet still I am uneasy with "created" historical events.
It was difficult to understand arkivo:Date "contains the dates mentioned in an item", and not being related to Event, before considering it as a arkivo:NamedThing that can be 'mentioned' - and so this seems like a "DateMention" highlight annotation - however its siblings foaf:Agent and schema:Place exists if they are mentioned or not (as opposed to a more meta "PersonName" or "PlaceMention"). As a Date is a place in time (and timezone-wise also in space), should it not be free-standing like schema:Place?
I would suggest looking at the newly updated https://www.w3.org/TR/owl-time/ and reuse say time:TemporalInterval - which means a loose 'date' like "Summer of 1968" or "last night" can be schema:mentioned.
The Figure 2 example use "dc:creator" but should be using the object property "dct:creator" as in the Arkivo OWL ontology.
I would have expected the PROV-O ontology to be used, as "dc:creator" is incredible loose and ambiguous. For instance an bibo:AudioDocument as a physical object could have one person recording, another person talking, and another person writing the speech - not to mention who made the tape - is it a copy from somewhere, etc. - see our https://doi.org/10.1186/2041-1480-4-37#Sec3 for a critique of dc:crator
Overall evaluation

Overall paper evaluation: **1: weak accept **
Detailed comments to the authors

Hi, I am Stian Soiland-Reyes http://orcid.org/0000-0001-9842-9718 and believe in open reviews.
I would appreciate if you could contact soiland-reyes@manchester.ac.uk if you agree on me publishing this review.
Edit: Authors agreed on publishing the review.
This review is licensed under a Creative Commons Attribution 4.0 International License http://creativecommons.org/licenses/by/4.0/
This paper presents Arkivo, an ontology to describe physical documents and artifacts in a library/museum archive, and their relationships and mentions of people and events in the world history. It extends existing work on bibliographical information to include archival aspects (e.g. collections and fonds) and intends to make it easier to query for co-related items or to add brief historical annotations.
The paper is well presented, explains the problem space clearly, and describes existing metadata approaches from the Digital Library community. The ontology is given a concise overview, backed by details of how it was designed.
The paper goes in a bit too much detail on its case study, section 3 could be reduced by shrinking the first two paragraphs.
Figure 2 is difficult to understand, not helped by the repeated use of "arkivo:" prefix both for the example data and the ontology itself.  Common practice is to use ":" or "ex:" for the example data, like ":LetterToComradesInLondon".  Grouping statements about the same subject also gives better readability. Figure 2 says it is "N3 notation" - but N3 for triples is superseded by W3C standard Turtle - https://www.w3.org/TR/turtle/
The ontology Arkivo is designed to be quite pragmatic and easy to follow - with a few quirks (see Resource Design Quality).
The authors need to fix the deployment of the ontology so it can actually be reused from OWL tools like Protege - this should be fairly straightforward but IMHO must be sorted before acceptance.
Additional comments

Confidential remarks for the program committee

Reviewer 1 (Stian Soiland-Reyes)

Note that I said "Weak Accept" as I think it is nicely described and probably a useful resource if you do archiving - but I don't think we should accept it before the deployment issues have been sorted - at least to the extent that it is easy to reuse from tools like Protege.
I agree with the other reviewers that the scientific contribution here is rather limited, it is more of a "We made this ontology" with few lessons to be learnt for anyone outside archival community.
I'll add now:

Little mention was made of the difficulties with provenance and overlapping identifiers when digitising old documents ("who scanned it" vs "who wrote it" vs "who said it") - although this paper is focusing on physical artifacts on the shelf, those can often be described in relation to other digital resources, e.g. using FABIO's FRBR model with Work/Expression/Manifestation/Item hierarchy, or PROV specializations and derivations.

Rebuttal

Response to authors

Edit: Thanks to the authors for fixing the deployment. I have improved the Reusability score, but left it low as it still feels a bit unreliable for others to rely on.