Skip to content

Instantly share code, notes, and snippets.

@anayram
Last active March 28, 2023 19:15
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save anayram/da2d37931a325f8730f1b25e6c06a22f to your computer and use it in GitHub Desktop.
Save anayram/da2d37931a325f8730f1b25e6c06a22f to your computer and use it in GitHub Desktop.
EAD to CSV conversion

EAD to CSV for AtoM batch ingest

This script transforms EAD records for AtoM bulk ingest.

Request from UofA Archives

The University of Alberta Archives currently uses AtoM as archival access database. Back when the Archives moved to AtoM, accessions data was migrated from Mimsy into AtoM as archival descriptions rather than as accession records.

In order to to have all accession records in AtoM, consider the following workflow: export all accession descriptions from AtoM as EAD XML files, map and transform into AtoM CSV, cleanup and update, and then import into AtoM as accession records.

This script is a test transformation to prepare archival institution records for CSV import (from EAD into AtoM csv batch ingest) as outlined at https://www.accesstomemory.org/en/docs/2.4/user-manual/import-export/csv-import/

Script

A small XSLT script was put together to process XML EAD metadata and export it as csv for Atom ingest. CSV headers and values output are annottated throughout.

Please contact metadata@ualberta.ca for assistance.

Sample Metadata

See sample below (xml file uaa-1968-001.xml)

<?xml version="1.0"?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:text>accessionNumber,</xsl:text>
<xsl:text>alternativeIdentifiers,</xsl:text>
<xsl:text>alternativeIdentifierTypes,</xsl:text>
<xsl:text>alternativeIdentifierNotes,</xsl:text>
<xsl:text>acquisitionDate,</xsl:text>
<xsl:text>sourceOfAcquisition,</xsl:text>
<xsl:text>locationInformation,</xsl:text>
<xsl:text>acquisitionType,</xsl:text>
<xsl:text>resourceType,</xsl:text>
<xsl:text>title,</xsl:text>
<xsl:text>archivalHistory,</xsl:text>
<xsl:text>scopeAndContent,</xsl:text>
<xsl:text>appraisal,</xsl:text>
<xsl:text>physicalCondition,</xsl:text>
<xsl:text>receivedExtentUnits,</xsl:text>
<xsl:text>processingStatus,</xsl:text>
<xsl:text>processingPriority,</xsl:text>
<xsl:text>processingNotes,</xsl:text>
<xsl:text>physicalObjectName,</xsl:text>
<xsl:text>physicalObjectLocation,</xsl:text>
<xsl:text>physicalObjectType,</xsl:text>
<xsl:text>donorName,</xsl:text>
<xsl:text>donorStreetAddress,</xsl:text>
<xsl:text>donorCity,</xsl:text>
<xsl:text>donorRegion,</xsl:text>
<xsl:text>donorCountry,</xsl:text>
<xsl:text>donorPostalCode,</xsl:text>
<xsl:text>donorTelephone,</xsl:text>
<xsl:text>donorFax,</xsl:text>
<xsl:text>donorEmail,</xsl:text>
<xsl:text>donorNote,</xsl:text>
<xsl:text>donorContactPerson,</xsl:text>
<xsl:text>creators,</xsl:text>
<xsl:text>eventTypes,</xsl:text>
<xsl:text>eventDates,</xsl:text>
<xsl:text>eventStartDates,</xsl:text>
<xsl:text>eventEndDates,</xsl:text>
<xsl:text>culture&#xa;</xsl:text>
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="/*">
<xsl:for-each select="collection('samples-100/?select=*.xml;recurse=yes')//*:ead">
<xsl:value-of select="//filedesc/titlestmt/titleproper/text()"/> <!-- accessionNumber -->
<xsl:text>,</xsl:text>
<!-- alternativeIdentifiers -->
<xsl:text>,</xsl:text>
<!-- alternativeIdentifiersTypes -->
<xsl:text>,</xsl:text>
<!-- alternativeIdentifiersNotes -->
<xsl:text>,</xsl:text>
<xsl:value-of select="//filedesc/publicationstmt/date/text()"/> <!-- acquisitionDate -->
<xsl:text>,</xsl:text>
<xsl:if test="archdesc/did/origination/child::*">
<xsl:text>&quot;</xsl:text>
<xsl:for-each select="archdesc/did/origination/child::*">
<xsl:value-of select="concat('Transfer from ',.)"/>
<xsl:text>. </xsl:text>
</xsl:for-each> <!-- sourceOfAcquisition --> <!-- check -->
<xsl:text>&quot;</xsl:text>
</xsl:if>
<xsl:text>,</xsl:text>
<xsl:value-of select="concat('&quot;',archdesc/originalsloc/p,'&quot;')"/> <!-- locationInformation -->
<xsl:text>,</xsl:text>
<xsl:text>Transfer</xsl:text> <!-- acquisitionType --> <!-- to be updated, possibly to be mapped from the legacy csv -->
<xsl:text>,</xsl:text>
<xsl:if test="archdesc/controlaccess/genreform">
<xsl:text>&quot;</xsl:text>
<xsl:for-each select="archdesc/controlaccess/genreform">
<xsl:value-of select="."/>
<xsl:text>. </xsl:text>
</xsl:for-each> <!-- resourceType -->
<xsl:text>&quot;</xsl:text>
</xsl:if>
<xsl:text>,</xsl:text>
<xsl:value-of select="//filedesc/titlestmt/titleproper/text()"/> <!-- title -->
<xsl:text>,</xsl:text>
<!-- archivalHistory -->
<xsl:text>,</xsl:text>
<xsl:value-of select="concat('&quot;',replace(//ead/archdesc/scopecontent/p[normalize-space()],' |\t|\n|&#13;',' ')),'&quot;'"/> <!-- scopeAndContent -->
<xsl:text>,</xsl:text>
<!-- appraisal -->
<xsl:text>,</xsl:text>
<xsl:value-of select="archdesc/phystech/p"/> <!-- physicalCondition -->
<xsl:text>,</xsl:text>
<xsl:for-each select="archdesc/did/physdesc">
<xsl:value-of select="concat('&quot;',replace(normalize-space(),'\n', ' '),'&quot;')"/>
</xsl:for-each> <!-- receivedExtentUnits -->
<xsl:text>,</xsl:text>
<!-- processingStatus -->
<xsl:text>,</xsl:text>
<!-- processingPriority -->
<xsl:text>,</xsl:text>
<xsl:value-of select="archdesc/otherfindaid/p"/> <!-- processingNotes -->
<xsl:text>,</xsl:text>
<!-- physicalObjectName -->
<xsl:text>,</xsl:text>
<!-- physicalObjectLocation -->
<xsl:text>,</xsl:text>
<!-- physicalObjectType -->
<xsl:text>,</xsl:text>
<!-- donorName -->
<xsl:text>,</xsl:text>
<!-- donorStreetAddress -->
<xsl:text>,</xsl:text>
<!-- donorCity -->
<xsl:text>,</xsl:text>
<!-- donorRegion -->
<xsl:text>,</xsl:text>
<!-- donorCountry -->
<xsl:text>,</xsl:text>
<!-- donorPostalCode -->
<xsl:text>,</xsl:text>
<!-- donorTelephone -->
<xsl:text>,</xsl:text>
<!-- donorFax -->
<xsl:text>,</xsl:text>
<!-- donorEmail -->
<xsl:text>,</xsl:text>
<!-- donorNote -->
<xsl:text>,</xsl:text>
<!-- donorContactPerson -->
<xsl:text>,</xsl:text>
<xsl:if test="archdesc/did/origination/child::*">
<xsl:text>&quot;</xsl:text>
<xsl:for-each select="archdesc/did/origination/child::*">
<xsl:value-of select="concat(.,'. ')"/>
</xsl:for-each> <!-- sourceOfAcquisition --> <!-- check -->
<xsl:text>&quot;</xsl:text>
</xsl:if> <!-- creators -->
<xsl:text>,</xsl:text>
<xsl:text>Creation</xsl:text> <!-- eventTypes-->
<xsl:text>,</xsl:text>
<xsl:value-of select="archdesc/did/unitdate"/> <!-- eventDates -->
<xsl:text>,</xsl:text>
<xsl:for-each select="archdesc/did/unitdate">
<xsl:value-of select="substring-before(.,'-')"/>
<xsl:text>. </xsl:text>
</xsl:for-each> <!-- eventStartDates -->
<xsl:text>,</xsl:text>
<xsl:for-each select="archdesc/did/unitdate">
<xsl:value-of select="substring-after(.,'-')"/>
<xsl:text>. </xsl:text>
</xsl:for-each> <!-- eventEndDates -->
<xsl:text>,</xsl:text> <!-- culture-->
<xsl:text>&#xa;</xsl:text>
</xsl:for-each>
</xsl:template>
</xsl:transform>
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE ead PUBLIC "+//ISBN 1-931666-00-8//DTD ead.dtd (Encoded Archival Description (EAD) Version 2002)//EN" "http://lcweb2.loc.gov/xmlcommon/dtds/ead2002/ead.dtd">
<ead>
<eadheader langencoding="iso639-2b" countryencoding="iso3166-1" dateencoding="iso8601"
repositoryencoding="iso15511" scriptencoding="iso15924" relatedencoding="DC">
<!-- unclear: acquisitionType, sourceOfAcquisition, eventTypes -->
<eadid identifier="University" countrycode="CA"
url="https://ualberta-edit.accesstomemory.org/uaa-1968-001" encodinganalog="identifier"
>UAA-1968-001</eadid>
<filedesc>
<!-- title -->
<titlestmt>
<!-- normalize space for names and titles -->
<titleproper encodinganalog="title">UAA-1968-001</titleproper>
</titlestmt>
<publicationstmt>
<publisher encodinganalog="publisher">University of Alberta Archives</publisher>
<address>
<addressline>Research &amp; Collections Resource Facility (RCRF)<lb/>6304-115A Street
NW</addressline>
<addressline>Edmonton</addressline>
<addressline>Alberta</addressline>
<addressline>Canada</addressline>
<addressline>T6G 2E1</addressline>
</address>
<!-- acquisitionDate ????? see below -->
<date normal="2017-11-28" encodinganalog="date">2017-11-28</date>
</publicationstmt>
</filedesc>
<profiledesc>
<creation> Generated by Access to Memory (AtoM) 2.7.0-rc2 <date normal="2023-02-14">2023-02-14
19:55 UTC</date>
</creation>
<langusage>
<language langcode="eng">English</language>
</langusage>
</profiledesc>
</eadheader>
<archdesc otherlevel="accession" level="otherlevel" relatedencoding="RAD">
<did>
<!-- accessionNumber -->
<unittitle encodinganalog="1.1B">UAA-1968-001</unittitle>
<unitid encodinganalog="1.8B11" countrycode="CA">UAA-1968-001</unitid>
<!-- eventDates -->
<!-- eventStartDates -->
<!-- eventEndDates -->
<!-- change event date to include both single dates and date ranges (use single for start and end) -->
<unitdate id="atom_278974_event" encodinganalog="1.4B2">1928-1961</unitdate>
<!-- receivedExtentUnits ? --> <!-- cleanup case, spacing -->
<physdesc encodinganalog="1.5B1">32.50 m of textual records </physdesc>
<repository>
<corpname>University of Alberta Archives</corpname>
<address>
<addressline>Research &amp; Collections Resource Facility (RCRF)<lb/>6304-115A Street
NW</addressline>
<addressline>Edmonton</addressline>
<addressline>Alberta</addressline>
<addressline>Canada</addressline>
<addressline>T6G 2E1</addressline>
</address>
</repository>
<langmaterial encodinganalog="1.8B9a">
<language langcode="eng">English</language>
</langmaterial>
<note type="generalNote" encodinganalog="1.8B21">
<p>Accession also contains materials for the Board of Governors RG.</p>
</note>
<!-- sourceOfAcquisition -->
<!-- creators -->
<!-- If it comes from University, donor is creator. Private donations - donor and creator would be different --> <!-- derive name from origination family too -->
<origination encodinganalog="1.4D">
<corpname id="atom_278974_actor">Office of the President and Vice-Chancellor</corpname>
</origination>
</did>
<bioghist id="md5-2fd634b02197fb0bf5e2b70025dc6add" encodinganalog="1.7B">
<note>
<p>The President is the chief executive officer of the University and the Vice-Chancellor.
The Board of Governors appoints the President and prescribes his tenure of office. The
President has the general supervision over and direction of the operation of the
University, including its academic work, the instructional and ancillary staff (including
the deans of the faculties, the registrar and the librarians), and its business affairs.
The Board has the authority to assign any other powers, duties and functions to the
President for the welfare of the University. The President may delegate any of his powers,
duties or functions as he sees fit and prescribe conditions governing the exercise of any
delegated power, duty or function, including the power of subdelegation. The President
reports annually to the Board and the Senate on the academic work of the University and
its progress and requirements, and makes any recommendations thereon he considers
necessary (Alberta. Universities Act, RSA 1980).<lb/><lb/>The office of the President is
organized hierarchically with the President and Vice Chancellor at the top, followed by
the Chief of Staff, University Secretary, and General Counsel. The Chief of Staff oversees
the Director of Office of the Senate and Director. The Director manages the Information
Coordinator, Events and Projects Specialist, and the Senior Executive
Coordinator.<lb/><lb/>The President is appointed by the Board of Governors and broadly:
provides the university with strategic leadership; oversees the university's operational
and change management processes; represents and upholds academic credibility; and leads
the university's external relations locally, nationally, and
internationally.<lb/><lb/>Presidents: 1908-1928 Henry Marshall Tory; 1928-1936 Robert
Charles Wallace; 1936-1941 William Alexander Robb Kerr; 1941-1942 Robert Newton (Acting);
1942-1949 Robert Newton; 1949-1959 Andrew Stewart; 1959-1969 Walter Hugh Johns; 1969-1974
Max Wyman; 1974- 1979 Harry E. Gunning; 1979-1989 Myer Horowitz; 1989- Paul Davenport.</p>
</note>
</bioghist>
<odd type="publicationStatus">
<p>Published</p>
</odd>
<odd type="descriptionIdentifier">
<p>University</p>
</odd>
<odd type="institutionIdentifier">
<p>AEU</p>
</odd>
<!-- scopeAndContent -->
<scopecontent encodinganalog="1.7D">
<p>Presidents' Papers: R.C. Wallace, W.A.R. Kerr, Robert Newton, Andrew Stewart, and Walter
Johns.</p>
</scopecontent>
<!-- resourceType -->
<controlaccess>
<genreform source="rad" encodinganalog="1.1C">Textual record</genreform>
</controlaccess>
<!-- physicalCondition -->
<phystech encodinganalog="1.8B9a">
<p>good</p>
</phystech>
<accruals encodinganalog="1.8B19">
<p>9.3.1968</p>
</accruals>
<processinfo>
<p>
<date>MMCFARLANE 2.15.2008</date>
</p>
</processinfo>
<!-- locationInformation -->
<!-- make sure HTML is preserved -->
<originalsloc encodinganalog="1.8B15a">
<p>* Main<lb/>* Small Accessions<lb/>* map cabinet</p>
</originalsloc>
<!-- flag: access information -->
<accessrestrict encodinganalog="1.8B16a">
<p>open</p>
</accessrestrict>
<!-- processingNotes -->
<otherfindaid encodinganalog="1.8B17">
<p>file inventory; index</p>
</otherfindaid>
<dsc type="combined"> </dsc>
</archdesc>
</ead>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment