mjsuhonos/thoughts.md

## thoughts.md

      
    Raw
  

              thoughts.md
            
          
    Records, Documents, & Graphs (II)

On the need for running code and rough consensus.


If you want to make the Semantic Web a reality, stop making the case for it and spend your time doing something more useful, like actually making machines smarter or helping people publish data in a way that’s useful to them.
Manu Sporny, "JSON-LD and Why I Hate the Semantic Web" --21 Jan 2014

Context

This document is an attempt to bring together several parallel discussions that revolve around the lack of a shared understanding on how to represent RDF resources in relation to the (pre-existing) concepts of records, documents, objects, and graphs.  In particular:

https://gist.github.com/no-reply/6a635c7ced661c65aeea
ActiveTriples/ActiveTriples#117
https://github.com/mtrudel/pragmatic_context/blob/master/README.md

Preamble (or, a Frustrated Ramble)

To ask anyone involved in the Semantic Web, it would seem that Linked Data has the greatest potential for the advancement of online data interoperability since the invention of the Web itself.  Yet, adoption of RDF and associated technologies has been inordinately stunted -- as Sporny clearly describes -- very often by the same proponents of the Semantic Web.  I have seen firsthand how intelligent, extremely knowledgeable people caustically dismiss and discourage otherwise practical and useful ideas because they contain a small imperfection or insignificant misunderstanding.  My only thought in response to these people is to quote "The Dude" Lebowski and say, "You're not wrong, you're just an asshole."
In the vast majority of cases, being completely, academically correct is actually a barrier to implementation; this is exactly the difference between science and engineering.  If we care more about "being right" than "building stuff" then being right doesn't matter, because nothing will ever get built.  We can look at JSON-LD as a canoncial example: few would argue (and they would argue) that JSON-LD hasn't made Linked Data far more accessible to many, and yet it seems to upset both pragmatists and RDF purists.  It is a necessary compromise, and from that comes its utility.
Records and Documents and Graphs and Objects and Triples

We see much of the same dogmatism, particularly in the information science community, when it comes to the conceptualization of records and documents (or objects; I will use the term 'objects' henceforth).  For all the shortcomings of the FRBR data model, its main value comes in moving from the concept of a fixed, flat record as the unit of information, to an object-oriented model.  This at least brings the domain model closer to technology practice, where object-relational (or object-document, or entity-relation) theory has been well established over several decades.  One might reasonably assume, then, that RDF proponents would want to build upon that massive existing body of experience.
And yet, we see people dogmatically clinging to graphs: "RDF is a graph model, you see. Objects are too simple. You have to understand graphs to use RDF."  Except that graphs contain objects, so really, you don't.  Or we see people dogmatically talking about triples: "The Semantic Web is a triple model, you see. Objects are too complex. You have to understand triples to use RDF."  Except that triples link objects, so really, you don't.  And then on the Web the primitive is the resource (incidentally, the 'R' in URL, RDF, and REST).  We seem to (again, mostly) agree that resources are the fundamental element of the Web, but that quickly falls apart when we ask what a resource -- speficially, a metadata resource -- represents: is it a document? An object? A graph?
The Web is a Graph

In 2007, Tim Berners-Lee proposed abandoning the term "Semantic Web" in favour of the phrase "Giant Global Graph", a move that certainly upset RDF purists in exchange for a new alliteration symmetric with the original W3.  The important acknowledgement here was that the both the Semantic Web and the regular World Wide Web are already both graphs that link resources.
It would follow, then, that if the G3 is a graph that links resources, and an RDF graph describes the G3, and an RDF graph links objects, that resources in the G3 are objects.  This seems to be consistent with most canonical examples of Linked Data resources being used to describe instances of RDF classes, i.e. objects.  Except for those cases where even the experts can't seem to agree, such as the infamous httpRange-14 issue.  Yet, even in OWL, a class is a collection of objects, and an instance of a class is an object.
###Resources are Objects
What, then, are the constraints if we assume G3 resources are objects?

a resource requires its own IRI (to be linked to other resources)
a resource can only contain RDF statements with itself as the subject
every object in an RDF graph must exist as a G3 resource in order to be linked (using IRIs)

Graph Persistence and the Open World

Much debate has transpired around the notion of "persisting a graph" (more accurately subgraphs or graph fragments), usually as a G3 resource.  Often this comes from requirements around mutability or blank nodes, and the need to preserve some concept of isomorphism or graph equality.  I would argue that this is fundamentally a closed world perspective: once you make an object identifiable with a URI, you allow it to become an resource within the G3 and thus part of a graph that is uncontrolled by design.
You can make statements about the objects in your subgraph, and you can link them to other objects, but the act of publishing them openly -- and that's the whole point, remember -- is likely to mutate them.  Blank nodes are fine as placeholders in subgraphs that aren't published, but the value in retaining them for the sake of persistence or immutability is questionable when it prevents them from being published as resources within the G3.  To this point, I'm still not convinced that Skolemization is an entirely necessary workaround for blank nodes.
At the same time, "retrieving a graph" in the same context is equally troublesome.  Once a subgraph is woven into the G3, its edges disappear; preserving the boundaries of the "originally persisted" subgraph with the aim of ensuring isomorphism rapidly becomes futile.  Unless perhaps it doesn't link to any other G3 resources outside the subgraph, in which case its usefulness is questionable (I suppose this might be Linked Closed Data).
A (Final?) Caveat

This document represents current thinking based on my best understanding and an over-arching atttempt to be pragmatic, even if that means making simplifications or omitting corner cases.  There may be things that I don't fully understand, about which I am happy to be enlightened so long as it helps move towards the goal of more accessible implmentation.