Skip to content

Instantly share code, notes, and snippets.

View brialparker's full-sized avatar

Bria Parker brialparker

  • University of Maryland Libraries
View GitHub Profile
@brialparker
brialparker / Beast_to_ArchivesSpace.md
Last active February 9, 2017 15:50
A description of the process to transform our EAD for ArchivesSpace import

Getting UMD Finding Aids into ArchivesSpace was an iterative process:

Once we had laid out all the changes that we thought needed to be made, our Systems Librarian developed a transformation script/application in Python to handle many of the changes, including inserting handle uris as an eadid attribute, clean up our parent/child container situation, strip unnecessary sections and empty element tags and otherwise tidy up and remove some of our local practices that are not necessary in ArchivesSpace. We then had a much cleaner set of EAD to work with.

I (Metadata Librarian) then used Dallas Pillen's (Bentley) fabulous date cleanup scripts for extracting dates and add normal date attributes with normalized dates, using the workflow (and OpenRefine!) as outlined in the [Bentle

@brialparker
brialparker / UMD_HT_metadata.md
Last active January 14, 2016 12:47
Metadata clean-up for HathiTrust

HathiTrust requires metadata to be in MARC-XML format, UTF-8 encoding. Here's how we get there.

#####Step 1: request the MARC records from Aleph.

Because the records for Hebraica are suppressed, we cannot use Z39.50 against our Aleph catalog to retrieve them (that's something to think about, though, for future projects). Instead, we have to submit an Aleph RX request for them. The parameters CLAS needs to complete the request are:

Sublibrary:CPMCK

Collection:CAT

@brialparker
brialparker / UMD_HT_prc.md
Last active October 18, 2019 17:05
Processing Files for HathiTrust Ingest at UMD

The following applies to the scripts located here.

Once DCMR has finished the QC/QA of all the files, they will inform MSD (or Robin, who will then inform MSD?) that their work is complete. At this point it is time to process the files according the [HathiTrust Cloud Validation and ingest requirements] (https://docs.google.com/document/d/1OQ0SKAiOH8Xi0HVVxg4TryBrPUPtdv4qA70d8ghRltU/edit).

#####Step 1: Remove all unnecessary files

The only files that should exist in each directory as this work begins are .tif and .txt files (the scanned images and the text ocr). To remove all of the .jpg files, navigate in the terminal to the directory that contains ALL of the subdirectories (as in, the directory that contains folders for each barcode) and run:

find . -name "*.jpg" -type f

In UMDM, we use covPlace to indicate the geographical location associated with the production of the content of a resource (not the geographical subject). (though I posit that this has not been consistently applied...)

Anywho...

<covPlace>
  <geogName type="continent">North America</geogName>
  <geogName type="country">United States of America</geogName>
  <geogName type="region">Maryland</geogName>
 College Park

Within the agent element in the UMDM schema there are two attributes that indicate agent role and agent type. Not all agents have both attributes, but at the least, each agent should have a type. There are three types: creator, contributor, and provider. The UMDM schema differentiates between personal and corporate names via the use of the persName or corpName element within agent.

A simple example of a person creator:

<agent type="creator">
  <persName>Olson, Mancur</persName>
</agent>

Our situation will be slightly different since we use a local schema (UMDM) and not MODS. Our title element is very simple.

Current:

<title type="main">WMUC flyer, circa 1995</title>
<title type="alternate">WMUC for the future flyer</title>

Proposed: