Skip to content

Instantly share code, notes, and snippets.

@cjcolvar
Created October 31, 2013 14:23
Show Gist options
  • Save cjcolvar/7250652 to your computer and use it in GitHub Desktop.
Save cjcolvar/7250652 to your computer and use it in GitHub Desktop.
A script which will clean up an exported avalon fedora object to prepare it to become a fixture
require 'nokogiri'
require 'open-uri'
require 'base64'
filename = ARGV[0]
f = File.new filename
doc = Nokogiri::XML::Document.parse f
f.close
#Get rid of Audit trail
doc.xpath("//foxml:datastream[@ID='AUDIT']").each {|n| n.remove()}
#Keep only the last version of a datastream
doc.xpath("//foxml:datastreamVersion[position() < last()]").each {|n| n.remove()}
doc.xpath("//foxml:datastreamVersion").each do |n|
n.remove_attribute("CREATED")
n.remove_attribute("SIZE")
n["ID"] = n["ID"].gsub(/\.[1-9]+/,".0")
end
#set all datastreams versionable false
doc.xpath("//foxml:datastream").each {|n| n["VERSIONABLE"] = "false"}
#change M datastreams to X for type xml
doc.xpath("//foxml:datastreamVersion[@MIMETYPE='text/xml']/..").each {|ds| ds["CONTROL_GROUP"] = "X"}
#Decode binaryContent
nodeset = doc.xpath("//foxml:datastreamVersion[@MIMETYPE='text/xml']/foxml:binaryContent/..")
nodeset.each do |node|
childdoc = Nokogiri::XML::DocumentFragment.parse Base64.decode64(node.content.gsub(/\s+/,''))
node.children = '<xmlContent>' #This is getting transformed to xmlcontent and fedora is complaining!!!
node.child.children = childdoc
node.child.child.traverse {|n| n.namespace = nil}
end
#HACK mods datastreams don't have xsi namespace declared
doc.xpath("//mods").each {|ds| ds["xmlns:xsi"] = "http://www.w3.org/2001/XMLSchema-instance"}
newfilename = "#{File.dirname(filename)}/#{File.basename(filename, File.extname(filename))}.edit#{File.extname(filename)}"
f2 = File.new newfilename, 'w+'
f2.write doc.to_s.gsub('xmlcontent','xmlContent').gsub(/^\s*$\n/, '') #HACK fix for xmlContent and cleanup nokogiri's blank lines
f2.close
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment