Created
November 4, 2013 22:13
-
-
Save jpstroop/7310094 to your computer and use it in GitHub Desktop.
Qualified DC to N-Triples
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'nokogiri' | |
require 'rdf' | |
require 'chronic' | |
xmldoc = Nokogiri::XML('<metadata xmlns="http://example.org/myapp/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xsi:schemaLocation="http://example.org/myapp/ http://example.org/myapp/schema.xsd"> | |
<dc:title> | |
UKOLN | |
</dc:title> | |
<dcterms:alternative> | |
UK Office for Library and Information Networking | |
</dcterms:alternative> | |
<dc:subject> | |
national centre, network information support, library | |
community, awareness, research, information services,public | |
library networking, bibliographic management, distributed | |
library systems, metadata, resource discovery, | |
conferences,lectures, workshops | |
</dc:subject> | |
<dc:subject xsi:type="dcterms:DDC"> | |
062 | |
</dc:subject> | |
<dc:subject xsi:type="dcterms:UDC"> | |
061(410) | |
</dc:subject> | |
<dc:description> | |
UKOLN is a national focus of expertise in digital information | |
management. It provides policy, research and awareness services | |
to the UK library, information and cultural heritage communities. | |
UKOLN is based at the University of Bath. | |
</dc:description> | |
<dc:publisher> | |
UKOLN, University of Bath | |
</dc:publisher> | |
<dcterms:isPartOf xsi:type="dcterms:URI"> | |
http://www.bath.ac.uk/ | |
</dcterms:isPartOf> | |
<dc:identifier xsi:type="dcterms:URI"> | |
http://www.ukoln.ac.uk/ | |
</dc:identifier> | |
<dcterms:modified xsi:type="dcterms:W3CDTF"> | |
2001-07-18 | |
</dcterms:modified> | |
<dc:format xsi:type="dcterms:IMT"> | |
text/html | |
</dc:format> | |
<dcterms:extent> | |
14 Kbytes | |
</dcterms:extent> | |
</metadata> | |
') | |
map = { | |
'alternative' => RDF::DC.alternative, | |
'description' => RDF::DC.description, | |
'extent' => RDF::DC.extent, | |
'format' => RDF::DC.format, | |
'identifier' => RDF::DC.identifier, | |
'isPartOf' => RDF::DC.isPartOf, | |
'modified' => RDF::DC.modified, | |
'publisher' => RDF::DC.publisher, | |
'subject' => RDF::DC.subject, | |
'title' => RDF::DC.title | |
} | |
date_localnames = [ | |
'modified', | |
'created', | |
'date', | |
'dateAccepted', | |
'dateCopyrighted', | |
'dateSubmitted' | |
] | |
g = RDF::Graph.new | |
id = RDF::URI.new('http://localhost/myobject') | |
xmldoc.xpath('/*/*').each do |e| | |
ln = e.xpath('local-name()') | |
v = e.text().strip() | |
if date_localnames.include? ln | |
v = DateTime.parse(Chronic::parse(v).to_s) | |
else | |
v= v.split.join(' ') | |
end | |
g << [id, map[ln], v] | |
end | |
puts g.dump(:ntriples) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment