I did a bit of review of the Linked Data of a respected CH institution, which will remain unnamed. Below are my findings
- go to http://www.culturaitalia.it/oaiProviderCI/listRecords.html (note: http://www.culturaitalia.it/oaiProviderCI/OAIHandler is the OAI endpoint for automated/complete downloading)
- specify metadataPrefix: cidoc_crm
- save the first RDF record as ex1.rdf
- convert to turtle:
riot --formatted=turtle ex1.rdf 1>ex1.ttl
- The file begins
<rdf:RDF xsi:schemaLocation="http://www.w3.org/2000/01/rdf-schema# http://erlangen-crm.org/120111/" >
This causes error
[line: 1, col: 102] The prefix "rdf" for element "rdf:RDF" is not bound.
Fix like so:
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:crm="http://erlangen-crm.org/120111/" >
- using a dated version of ontology URLs is a bad idea. This means your data can interop ONLY with people who use the same exact version (i.e. nobody)
- special chars in URLs are not URL-encoded. This causes error
[line: 6, col: 97] The reference to entity "case" must end with the ';' delimiter
Change &
to &
But notice that there is a badly escaped URL in crm:E18_Physical_Thing: &-id=oai%3aculturaitalia-it
- there is some HTML markup inside a literal, which just won't do
<rdf:value>
<a href="viewItem.jsp?language=en&id=oai%3Aculturaitalia.it%3Amuseiditalia-coll_306">View parent resource</a>
</rdf:value>
Causes these errors:
{W104} Unqualified typed nodes are not allowed. Type treated as a relative URI.
{W136} Relative URIs are not permitted in RDF: specifically <a>
{W136} Relative URIs are not permitted in RDF: specifically <href>
{W102} Unqualified property attributes are not allowed. Property treated as a relative URI.
{E202} Expecting XML start or end element(s). String data "View parent resource" not allowed
Commented out
Now that we've the fixed syntactic errors, we can start looking at the RDF data. It tries (but fails) to describe this object, which has nice structured data: http://www.culturaitalia.it/opencms/it/temi/viewItem.jsp?language=it&id=oai%3Aculturaitalia.it%3Amuseiditalia-work_34345
- uses
rdf:value
throughout, but CRM doesn't specify the use of such property, and nobody else uses it (usecrm:P3_has_note
orrdfs:label
instead) - maybe the weirdest construct is below.
crm:E62_String
does not exist, just use the literal directly!
crm:P3_has_note [ a crm:E62_String ;
rdf:value "Testa di Atena con elmo adorno di grigfone"
] ;
- language tags are not provided anywhere
- includes relative URLs, which get resolved to the local directory, not good: file:///C:/my/Onto/culture/CulturaItalia/fedora/objects/work:34345/datastreams/export/content
- includes IP addresses in URLs, which doesn't satisfy the requirements for permanent URLs http://194.242.241.163/fedora/objects/work:34345/datastreams/MM105015/content
- refers to a thesaurus that does not resolve http://culturaitalia.it/pico/thesaurus/4.1#monete_e_medaglie
- Using HASH in the URL will fetch the complete thesaurus. And if the thesaurus has 5k or 50k entries?
- none of the referenced resources resolve, so this is completely Unlinked Data, eg http://culturaitalia.it/resource/material/coniata-in-argento
- A few dimensions, but none has structured data, only text.
- Dimension URLs are global but should be per-object, eg http://culturaitalia.it/resource/dimension/diametro-cm-2-21
- A few Places, but no structured info: eg this doesn't say the place is in Italy or in Molise
<http://culturaitalia.it/resource/place/paslazzo-mazzarotta-cb-molise-italia-inv-45829-09-2011->
a crm:E53_Place ;
rdf:value "paslazzo Mazzarotta (CB), Molise - Italia, inv. 45829 (09/2011)" .
- including inventory numbers in places is wrong, that should be an crm:E42_Identifier
- lots of blank-node types, rather than using some thesaurus. Eg:
<http://194.242.241.163/fedora/objects/work:34345/datastreams/MM105015/content>
a crm:E36_Visual_Item ;
crm:P2_has_type [ a crm:E55_Type ;
rdf:value "preview"
] .
Several mistakes in this part:
<file:///C:/my/Onto/culture/CulturaItalia/fedora/objects/work:34345/datastreams/export/content>
a crm:E89_Propositional_Object ;
crm:P1_is_identified_by [ a crm:E42_Identifier ;
rdf:value "fedora/objects/work:34345/datastreams/export/content" ;
crm:P2_has_type [ a crm:E55_Type ;
rdf:value "URL"
]
] .
- relative URL is used
- crm:E89_Propositional_Object is wrong. We can't tell what that "content" is as there is no info about that resource (eg its MIME type). If you check http://194.242.241.163/fedora/objects/work:34345/datastreams/export/content, you'll see it is in fact VRA Core XML, so indeed a Document
- blank-node type is used
Totally ugly, non-permanent, and wrong URL (notice the incomplete/unescaped XML entity &-id
).
crm:P46i_forms_part_of
<http://culturaitalia.it/resource/thing/-a-href=-viewitem-jsp?language=en&-id=oai%3aculturaitalia-it%3amuseiditalia-c-$5$G0BTMMeY$sgniH4PL39nLT4IHvU2jjVILwGGaI.cYDrmDZXWeK9A>
- and then, there is no useful info about that resource:
<http://culturaitalia.it/resource/thing/-a-href=-viewitem-jsp?language=en&-id=oai%3aculturaitalia-it%3amuseiditalia-c-$5$G0BTMMeY$sgniH4PL39nLT4IHvU2jjVILwGGaI.cYDrmDZXWeK9A> ;
a crm:E18_Physical_Thing .
- "Scheda ICCD NU" means "ICCD Card Number", so again that's a Document not crm:E89_Propositional_Object
<http://culturaitalia.it/resource/object/scheda-iccd-nu-14-00085092>
a crm:E89_Propositional_Object ;
rdf:value "Scheda ICCD NU: 14-00085092" .