Skip to content

Instantly share code, notes, and snippets.

@masao
Created October 3, 2021 07:04
Show Gist options
  • Save masao/d472ee773dd207ac4dc9b31d63fed0ed to your computer and use it in GitHub Desktop.
Save masao/d472ee773dd207ac4dc9b31d63fed0ed to your computer and use it in GitHub Desktop.
#!/usr/bin/env ruby
require "open-uri"
require "csv"
require "nokogiri"
CSV.foreach(ARGV[0], { headers: true, col_sep: "\t" }) do |row|
iss_url = row["manifestation_identifier"]
manifestation_id = row["manifestation_id"]
creators = []
publishers = []
xml = nil
if iss_url and iss_url =~ /\Ahttp:\/\/iss.ndl.go.jp\//
iss_rdf_url = iss_url + ".rdf"
begin
xml = URI.open(iss_rdf_url).read
rescue OpenURI::HTTPError => e
STDERR.puts [ "WARN", iss_rdf_url, e.message ].join("\t")
end
if xml
doc = Nokogiri::XML(xml)
doc.xpath('//dcterms:creator/foaf:Agent').each do |creator|
creators << creator.at('./foaf:name').content
end
doc.xpath('//dcterms:publisher/foaf:Agent').each do |publisher|
publishers << publisher.at('./foaf:name').content
end
end
end
puts [ manifestation_id, iss_url,
creators.join("//"), publishers.join("//") ].join("\t")
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment