Skip to content

Instantly share code, notes, and snippets.

@hellbunnie
Last active November 28, 2018 15:01
Show Gist options
  • Save hellbunnie/dfca37537a80ec698a4cf9c773e4566a to your computer and use it in GitHub Desktop.
Save hellbunnie/dfca37537a80ec698a4cf9c773e4566a to your computer and use it in GitHub Desktop.
Open Refine template for exporting tabular data to DRI-ready Dublin Core XML
{{forNonBlank(cells["Identifier"], v, "FILENAME "+v.value, "")}}
<qualifieddc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:marcrel="http://www.loc.gov/marc.relators/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/marc.relators/ http://imlsdcc2.grainger.illinois.edu/registry/marcrel.xsd" xsi:noNamespaceSchemaLocation="http://dublincore.org/schemas/xmls/qdc/2008/02/11/qualifieddc.xsd">
{{forNonBlank(cells["id"], v, "<dc:identifier>"+v.value+"</dc:identifier>", "")}}
{{forNonBlank(cells["Title"], v, "<dc:title>"+v.value+"</dc:title>", "")}}
{{forNonBlank(cells["Creator"], v, "<dc:creator>"+v.value+"</dc:creator>", "")}}
{{forNonBlank(cells["Date"], v, "<dc:date>"+v.value+"</dc:date>", "")}}
{{forNonBlank(cells["Description"], v, "<dc:description>"+v.value+"</dc:description>", "")}}
{{forNonBlank(cells["Description2"], v, "<dc:description>"+v.value+"</dc:description>", "")}}
{{forNonBlank(cells["Rights"], v, "<dc:rights>"+v.value+"</dc:rights>", "")}}
{{forNonBlank(cells["Type"], v, "<dc:type>"+v.value+"</dc:type>", "")}}
{{forNonBlank(cells["Language"], v, "<dc:language>"+v.value+"</dc:language>", "")}}
{{forNonBlank(cells["Contributor"], v, "<dc:contributor>"+v.value+"</dc:contributor>", "")}}
{{forNonBlank(cells["Relation"], v, "<dc:relation>"+v.value+"</dc:relation>", "")}}
{{forNonBlank(cells["Source"], v, "<dc:source>"+v.value+"</dc:source>", "")}}
{{forNonBlank(cells["Coverage"], v, "<dc:coverage>"+v.value+"</dc:coverage>", "")}}
{{forNonBlank(cells["Subject"], v, "<dc:subject>"+v.value+"</dc:subject>", "")}}
{{forNonBlank(cells["Subject2"], v, "<dc:subject>"+v.value+"</dc:subject>", "")}}
{{forNonBlank(cells["Subject3"], v, "<dc:subject>"+v.value+"</dc:subject>", "")}}
{{forNonBlank(cells["Subject4"], v, "<dc:subject>"+v.value+"</dc:subject>", "")}}
{{forNonBlank(cells["Subjects (Temporal)"], v, "<dcterms:temporal>"+v.value+"</dcterms:temporal>", "")}}
{{forNonBlank(cells["Subjects (Places)"], v, "<dcterms:spatial>"+v.value+"</dcterms:spatial>", "")}}
{{forNonBlank(cells["Identifier"], v, "<dc:identifier>"+v.value+"</dc:identifier>", "")}}
{{forNonBlank(cells["Format"], v, "<dc:format>"+v.value+"</dc:format>", "")}}
{{forNonBlank(cells["Format (width x height)"], v, "<dc:format>"+v.value+"</dc:format>", "")}}
{{forNonBlank(cells["Publisher"], v, "<dc:publisher>"+v.value+"</dc:publisher>", "")}}
</qualifieddc>
@hellbunnie
Copy link
Author

hellbunnie commented Jun 12, 2018

If you are using Open Refine to create XML metadata for ingest into DRI, you will want to start by importing your excel or csv file. Open Refine makes it easy to clean your data, so you may want to review the data after import and make any changes you need to. Once you're happy with the data go to the "Export - Templating" menu option on the top right.

You will see an interface where you can specify a Prefix, Row Template, Row Separator and Suffix. You are going to want to map each column in your data that you want to export to XML to a row in your template, so you don't need to use Prefix, Row Separator, or Suffix. We will just use the Row Template.

Fill in the "Row Template" field with a template based on the sample above. You will need to modify it to match the headers of your data.

The second line and last line are the XML headers and end tag, you can leave these as lines as they are.

The first line sets the filename that we will use for the final exported files, to use the exported files for batch ingest, this should match the filename of your asset file.

The other lines starting with "{{forNonBlank(cells[...]" are setting up the mapping for each of your columns to a valid Dublin Core XML tag. The column headings are specified within the square brackets, and the DC is given after.

The output that you will get will be all in one XML file, so you will need to split it into multiple files, if you have access to a unix-based machine you can easily do this using the awk tool:

awk '/^FILENAME/{close(output_file_name); output_file_name=$2".xml"; getline } { print $0 >> ( output_file_name ) }' filename_of_Open_Refile_export.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment