Skip to content

Instantly share code, notes, and snippets.

@briesenberg07
Last active August 9, 2024 21:19
Show Gist options
  • Save briesenberg07/6a62b1127a6df57ca2f1fa1d3409bf5e to your computer and use it in GitHub Desktop.
Save briesenberg07/6a62b1127a6df57ca2f1fa1d3409bf5e to your computer and use it in GitHub Desktop.
Reporting on ONS resources with rdf:type values which are classes that don't exist

opaquenamespace.org resources typed with classes that don't exist

See OregonDigital / ControlledVocabularyManager #624 - fix broken rdf:type values

Including below counts for number of resources in ONS which are typed using classes that don't exist, by vocabulary.

Also including a little script which can be run to reproduce the results. It uses Python rdflib to parse published data for each vocab listed at ONS.org into a graph and obtain counts for resources typed as each of five classes which are used widely in the data but do not exist.

Because the nonexistent classes use the SKOS namespace, also including some SKOS reference below.

SKOS reference

Access Restrictions controlled vocabulary

  • No results

Categories for the Description of Works of Art (CDWA)

  • No results

class controlled vocabulary

commonNames controlled vocabulary

Creator controlled vocabulary

Culture controlled vocabulary

  • No results

Department of Land Conservation and Development Subjects

  • No results

genus controlled vocabulary

GeoRSS

  • No results

Local Collection Name

Oregon Digital Rights Statements

  • No results

Oregon State University Academic Units

Oregon State University Buildings

  • No results

Oregon State University Degree Fields

People

phylum controlled vocabulary

publisher controlled vocabulary

Repository controlled vocabulary

Series Name

species controlled vocabulary

stylePeriod controlled vocabulary

Subject and Event controlled vocabularies

Technique controlled vocabulary

  • No results

Here is a sample label

TFDD International River Basins

  • No results

workType controlled vocabulary

  • No results
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"ons_vocabs = [\n",
" \"http://opaquenamespace.org/ns/accessRestrictions\",\n",
" \"https://opaquenamespace.org/ns/cdwa\",\n",
" \"https://opaquenamespace.org/ns/class\",\n",
" \"https://opaquenamespace.org/ns/commonNames\",\n",
" \"https://opaquenamespace.org/ns/creator\",\n",
" \"https://opaquenamespace.org/ns/culture\",\n",
" \"https://opaquenamespace.org/ns/DLCDsubject\",\n",
" \"https://opaquenamespace.org/ns/genus\",\n",
" \"https://opaquenamespace.org/ns/georss\",\n",
" \"https://opaquenamespace.org/ns/localCollectionName\",\n",
" \"https://opaquenamespace.org/ns/rights\",\n",
" \"https://opaquenamespace.org/ns/osuAcademicUnits\",\n",
" \"https://opaquenamespace.org/ns/osuBuildings\",\n",
" \"https://opaquenamespace.org/ns/osuDegreeFields\",\n",
" \"https://opaquenamespace.org/ns/people\",\n",
" \"https://opaquenamespace.org/ns/phylum\",\n",
" \"https://opaquenamespace.org/ns/publisher\",\n",
" \"https://opaquenamespace.org/ns/repository\",\n",
" \"https://opaquenamespace.org/ns/seriesName\",\n",
" \"https://opaquenamespace.org/ns/species\",\n",
" \"https://opaquenamespace.org/ns/stylePeriod\",\n",
" \"https://opaquenamespace.org/ns/subject\",\n",
" \"https://opaquenamespace.org/ns/technique\",\n",
" \"https://opaquenamespace.org/ns/TestVocabulary\",\n",
" \"https://opaquenamespace.org/ns/TFDDbasins\",\n",
" \"https://opaquenamespace.org/ns/workType\"\n",
"]\n",
"\n",
"not_classes = [\n",
" \"http://www.w3.org/2004/02/skos/core#CorporateName\",\n",
" \"http://www.w3.org/2004/02/skos/core#PersonalName\",\n",
" \"http://www.w3.org/2004/02/skos/core#Geographic\",\n",
" \"http://www.w3.org/2004/02/skos/core#Title\",\n",
" \"http://www.w3.org/2004/02/skos/core#Topic\"\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from rdflib import Graph, RDF, RDFS, URIRef\n",
"from rdflib.namespace import DCTERMS, DCAM\n",
"# https://rdflib.readthedocs.io/en/stable/index.html"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"# create ONS.org turtle files for reference if desired\n",
"# note local filepath for storage\n",
"\n",
"# for vocab in ons_vocabs:\n",
"# vocab_name = vocab.split('/').pop(-1)\n",
"# g = Graph()\n",
"# g.parse(f\"{vocab}.nt\")\n",
"# g.serialize(f\"ons_turtle/{vocab_name}.ttl\", format = \"turtle\")"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"#v2\n",
"markdown = \"\"\"\"\"\"\n",
"for vocab in ons_vocabs:\n",
" report = \"\"\"\"\"\"\n",
" g = Graph().parse(f\"{vocab}.nt\")\n",
" for subject in g.subjects(RDF.type, DCAM.VocabularyEncodingScheme): # all vocabs typed as DCAM VESs\n",
" vocab_name = \"\"\n",
" for literal in g.objects(subject, DCTERMS.title): # most vocabs have dct:title\n",
" vocab_name = literal\n",
" else:\n",
" for literal in g.objects(subject, RDFS.label): # but some have rdfs:label\n",
" vocab_name = literal\n",
" report += f\"# {vocab_name}\\n\"\n",
" results = False\n",
" results_report = \"\"\"\"\"\"\n",
" for entry in not_classes:\n",
" count = 0\n",
" for resource in g.subjects(RDF.type, URIRef(entry)):\n",
" count += 1\n",
" if count > 0:\n",
" results = True\n",
" results_report += f\"- Resources with rdf:type {entry}: {count}\\n\"\n",
" if results == True:\n",
" report += results_report\n",
" else:\n",
" report += \"- No results\\n\"\n",
" markdown += report\n",
"with open(\"2_202407_ons_results_report.md\", \"w+\") as file:\n",
" file.write(markdown)\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment