Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save ettorerizza/a54ccefbb1059becd0e4fd41f82bc2be to your computer and use it in GitHub Desktop.
Save ettorerizza/a54ccefbb1059becd0e4fd41f82bc2be to your computer and use it in GitHub Desktop.
Open Refine template for exporting tabular data to DRI-ready Dublin Core XML
<qualifieddc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dc="http://purl.org/dc/elements/1.1" xmlns:dcterms="http://purl.org/dc/terms" xmlns:marcrel="http://www.loc.gov/marc.relators" xsi:schemaLocation="http://www.loc.gov/marc.relators http://imlsdcc2.grainger.illinois.edu/registry/marcrel.xsd" xsi:noNamespaceSchemaLocation="http://dublincore.org/schemas/xmls/qdc/2008/02/11/qualifieddc.xsd">
{{forNonBlank(cells["id"], v, "<dc:identifier>"+v.value+"</dc:identifier>", "")}}
{{forNonBlank(cells["Title"], v, "<dc:title>"+v.value+"</dc:title>", "")}}
{{forNonBlank(cells["Creator"], v, "<dc:creator>"+v.value+"</dc:creator>", "")}}
{{forNonBlank(cells["Date"], v, "<dc:date>"+v.value+"</dc:date>", "")}}
{{forNonBlank(cells["Description"], v, "<dc:description>"+v.value+"</dc:description>", "")}}
{{forNonBlank(cells["Description2"], v, "<dc:description>"+v.value+"</dc:description>", "")}}
{{forNonBlank(cells["Rights"], v, "<dc:rights>"+v.value+"</dc:rights>", "")}}
{{forNonBlank(cells["Type"], v, "<dc:type>"+v.value+"</dc:type>", "")}}
{{forNonBlank(cells["Language"], v, "<dc:language>"+v.value+"</dc:language>", "")}}
{{forNonBlank(cells["Contributor"], v, "<dc:contributor>"+v.value+"</dc:contributor>", "")}}
{{forNonBlank(cells["Relation"], v, "<dc:relation>"+v.value+"</dc:relation>", "")}}
{{forNonBlank(cells["Source"], v, "<dc:source>"+v.value+"</dc:source>", "")}}
{{forNonBlank(cells["Coverage"], v, "<dc:coverage>"+v.value+"</dc:coverage>", "")}}
{{forNonBlank(cells["Subject"], v, "<dc:subject>"+v.value+"</dc:subject>", "")}}
{{forNonBlank(cells["Subject2"], v, "<dc:subject>"+v.value+"</dc:subject>", "")}}
{{forNonBlank(cells["Subject3"], v, "<dc:subject>"+v.value+"</dc:subject>", "")}}
{{forNonBlank(cells["Subject4"], v, "<dc:subject>"+v.value+"</dc:subject>", "")}}
{{forNonBlank(cells["Subjects (Temporal)"], v, "<dcterms:temporal>"+v.value+"</dcterms:temporal>", "")}}
{{forNonBlank(cells["Subjects (Places)"], v, "<dcterms:spatial>"+v.value+"</dcterms:spatial>", "")}}
{{forNonBlank(cells["Identifier"], v, "<dc:identifier>"+v.value+"</dc:identifier>", "")}}
{{forNonBlank(cells["Format"], v, "<dc:format>"+v.value+"</dc:format>", "")}}
{{forNonBlank(cells["Format (width x height)"], v, "<dc:format>"+v.value+"</dc:format>", "")}}
{{forNonBlank(cells["Publisher"], v, "<dc:publisher>"+v.value+"</dc:publisher>", "")}}
</qualifieddc>
@ettorerizza
Copy link
Author

Author comment :

"If you are using Open Refine to create XML metadata for ingest into DRI, you will want to start by importing your excel or csv file. Open Refine makes it easy to clean your data, so you may want to review the data after import and make any changes you need to. Once you're happy with the data go to the "Export - Templating" menu option on the top right.

You will see an interface where you can specify a Prefix, Row Template, Row Separator and Suffix. You are going to want to map each column in your data that you want to export to XML to a row in your template, so you don't need to use Prefix, Row Separator, or Suffix. We will just use the Row Template.

Fill in the "Row Template" field with a template based on the sample above. You will need to modify it to match the headers of your data. The first and last lines are the XML headers and end tag. The other lines starting with "{{forNonBlank(cells[...]" are setting up the mapping for each of your columns to a valid Dublin Core XML tag. The column headings are specified within the square brackets, and the DC is given after.

The output that you will get will be all in one XML file, so you will need to split it into multiple files. One way to do that would be to include the filename that you want to use in your row template too, before each XML header line, and then to use some other tool (e.g. if you use a unix system you could use awk).

awk '/^FILENAME/{close(output_file_name); output_file_name=$2".xml"; getline } { print $0 >> ( output_file_name ) }' filename_of_Open_Refile_export.txt

"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment