Skip to content

Instantly share code, notes, and snippets.

@jenyoung
Last active December 14, 2018 23:05
Show Gist options
  • Save jenyoung/a70a2b857b3f6fead2505cbd64308531 to your computer and use it in GitHub Desktop.
Save jenyoung/a70a2b857b3f6fead2505cbd64308531 to your computer and use it in GitHub Desktop.
ALMA to HathiTrust Metadata Instructions

Metadata Records into HathiTrust

General HathiTrust metadata submission guide https://goo.gl/FCbQBS

Step 1: Create an itemized set of physical items (NUL uses barcodes)

  • Create a spreadsheet with the header: Barcode -- format the column as text so that numeric strings do not convert to Scientific Notation.
  • Upload record set into ALMA: Admin > Manage Sets > Add set > itemized
  • Set content type = Physical items
  • Upload file, then Save
  • To see an error file: Administration > Manage Jobs > Monitor Jobs
  • Confirm count of set members matches count on spreadsheet of barcodes. Actions > Members

Step 2: Export from ALMA using a publishing profile (need correct ALMA permissions to do this)

  • ALMA menu > Resources > Publishing Profile > Add Profile
  • Profile Details
  • Select set name
  • Publishing Mode: Full
  • FTP – [your local ftp server]
  • Physical format: Binary // number of records in file: one
  • Data Enrichment
  • Add holding information > checked
    • 852 $b $c $h $i
  • Add items information > checked
    • Include item information > 955 $b (barcode) $v (description) $d (permanent location) $e (call number)
    • Run publishing profile > Actions -> Run

Step 3: Check publishing report and download file

  • Use FileZilla to download your exported file
  • [login to your local ftp server]
  • drag the file over to My Documents
  • Open with MARCEdit > MARC Tools
  • convert file to .mrk format (MARC Breaker function)

Step 4: Fields to remove/add -- Note: this could also be done with a normalization rule

  • Open the .mrk file created in MARCEdit
  • Remove 9XX fields other than the 955. (948, 949, 938, 994)
  • Remove 035 $9, ONLY 035 with OCLC should remain – also no 019 or 035 $z
  • Required elements: LDR (000), 001, 008, 035 $a (OCoLC) 040 $c, 245, 300 $a

Step 5: Record checks using MARCEdit

  • Make sure the counts of 000/001/008/035/245/955 equal the same quantity (one bib per item record)
  • Check MARCEdit MARC validation report
  • Confirm that all records have only one OCLC number
  • Validate headings – correct and establish headings as needed

Step 6: Convert final file to XML

  • In MARCEditor, compile .mrk file into MARC - File menu > Compile file into MARC. This will create .mrc file which is also UTF encoded. Close file.

  • Use MARCEdit tools to convert final file to MARC21XML

  • Click the MARC Tools Icon

  • Supply input and Output file names and make sure to select MARC->MARC21XML.

  • Execute

Step 7: Uploading to Zephir

  • [HathiTrust gives you a naming convention]

  • This naming convention also ensure sthat your files are not run through your configuration for any Google-scanned materials.

  • Use CoreFTP to upload file

  • Upload file to ftps.cdlib.org/submissions

Step 8: Send notification email to CDL

To: cdl-zphr-l@ucop.edu

file name=<file name>

file size=<file size in bytes>

record count=<number of records>

notification email=<email address to which you would like your run notification sent>

Step 9: look at error reports, etc.

Run reports for contributor's files are posted to their FTPS space, in the subdirectory ftps.cdlib.org/runreports.

You are responsible for retrieving your own error files via FTPS from ftps.cdlib.org/errfiles. Error files will remain in the FTPS location for 60 days.

Error files will be provided as MARCXML files using the file naming convention: original_file_name_error.xml

Correcting your records and re-submitting them

After correcting the errors, you may re-submit the records to ftps.cdlib.org/submissions , following the same guidelines you followed for initial record submission. Important : Because the loader script implements updates to records relying on a change in the date associated with each, it is critical that the filename used for the corrected records includes the date of re-submission and NOT the date included in the name of the file as initially submitted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment