Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
A script which will clean up an exported avalon fedora object to prepare it to become a fixture
require 'nokogiri'
require 'open-uri'
require 'base64'
filename = ARGV[0]
f = File.new filename
doc = Nokogiri::XML::Document.parse f
f.close
#Get rid of Audit trail
doc.xpath("//foxml:datastream[@ID='AUDIT']").each {|n| n.remove()}
#Keep only the last version of a datastream
doc.xpath("//foxml:datastreamVersion[position() < last()]").each {|n| n.remove()}
doc.xpath("//foxml:datastreamVersion").each do |n|
n.remove_attribute("CREATED")
n.remove_attribute("SIZE")
n["ID"] = n["ID"].gsub(/\.[1-9]+/,".0")
end
#set all datastreams versionable false
doc.xpath("//foxml:datastream").each {|n| n["VERSIONABLE"] = "false"}
#change M datastreams to X for type xml
doc.xpath("//foxml:datastreamVersion[@MIMETYPE='text/xml']/..").each {|ds| ds["CONTROL_GROUP"] = "X"}
#Decode binaryContent
nodeset = doc.xpath("//foxml:datastreamVersion[@MIMETYPE='text/xml']/foxml:binaryContent/..")
nodeset.each do |node|
childdoc = Nokogiri::XML::DocumentFragment.parse Base64.decode64(node.content.gsub(/\s+/,''))
node.children = '<xmlContent>' #This is getting transformed to xmlcontent and fedora is complaining!!!
node.child.children = childdoc
node.child.child.traverse {|n| n.namespace = nil}
end
#HACK mods datastreams don't have xsi namespace declared
doc.xpath("//mods").each {|ds| ds["xmlns:xsi"] = "http://www.w3.org/2001/XMLSchema-instance"}
newfilename = "#{File.dirname(filename)}/#{File.basename(filename, File.extname(filename))}.edit#{File.extname(filename)}"
f2 = File.new newfilename, 'w+'
f2.write doc.to_s.gsub('xmlcontent','xmlContent').gsub(/^\s*$\n/, '') #HACK fix for xmlContent and cleanup nokogiri's blank lines
f2.close
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.