Skip to content

Instantly share code, notes, and snippets.

@CJHArch
CJHArch / Html2Ca
Created January 12, 2015 21:24
xQuery for finding collective access data from HTML scraped from Buffalo library XTF-hosted finding aids. Data was curl'ed into one large XML, and imported into BaseX with namespaces stripped.
xquery version "3.0";
<results>
{
for $findingaid in /records/html
let $title:= $findingaid/meta[@name="dc.title"]/@content
let $creator := $findingaid/meta[@name="dc.author"]/@content
let $subject := $findingaid/meta[@name="dc.subject"]/@content
let $dates:= $findingaid//h2[@class="tp_titleproper"]
@CJHArch
CJHArch / Ead2Ca
Created January 12, 2015 21:26
xQuery for finding collective access data from EADs exported from Hartford Jewish Historical Society's instance of Archivists Toolkit (AT). Imported into BaseX with namespaces stripped.
xquery version "3.0";
<results>
{
for $ead in /ead
let $titleproper:= $ead/eadheader/filedesc/titlestmt/titleproper[1]/text()
let $creator := $ead/archdesc/did/unittitle
let $dates:= $ead/archdesc/did/unitdate
let $publisher:= $ead/eadheader/filedesc/publicationstmt/publisher
@CJHArch
CJHArch / Daos2Ead
Created January 12, 2015 21:30
Automatically adding and filling <dao> fields when the ingest PIDs are in the same order as the container list
<?xml version="1.0" encoding="UTF-8"?>
<!-- This stylesheet will add daos to a finding aid, using a list of PIDs. -->
<!-- One needs to select one of the two templates below, depending on whether the daos are to be added to all c0x level=file elements or just to a certain subset -->
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" encoding="UTF-8"/>
<xsl:strip-space elements="*"/>
<!-- Call PID file -->
@CJHArch
CJHArch / Macro-LBI-5digit
Created January 13, 2015 19:29
To be run on a directory of files to create a csv for use in ingesting digital files into DigiTool. Run on a spreadsheet with the list of files in column A and extensions in column B. Created for LBI collections with 5-digit AR numbers.
Sub Digitool_KDP_based_ingest_LBI_AR5()
'
' This macro will take a basic Karens Directory Printer output and prepare a template for Digitool CSV ingest. 100214 KS with updates by LL for LBI Creekside collections with five digits after the AR 2014-11-21
'
'
' RenameSheet Macro
'
'
@CJHArch
CJHArch / gist:3f9a5fb0a9d04df10270
Created January 16, 2015 15:56
This xQuery will grab PIDs and access rights metadata records from Digitool digital entities imported into BaseX. Namespaces were NOT stripped, so the name space declaration in line two is necessary.
xquery version "3.0";
declare namespace xb="http://com/exlibris/digitool/repository/api/xmlbeans";
<data>
{
for $Record in /xb:digital_entity_call
let $PID := $Record/xb:digital_entity/pid
let $arrecord := $Record/xb:digital_entity/mds/md[type[contains (., 'rights_md')]]
return
@CJHArch
CJHArch / OHsOutViaPid
Last active August 29, 2015 14:14
Xquery to get a list of records from the OAI feed based on an XML list of PIDs
xquery version "3.0";
<results>
{
for $PIDlist in doc('OH_PIDS_XML.xml')/data/pid/text()
let $OAIRecord := repository/record[header/identifier/substring-after(., "oai:digital.cjh.org:") = $PIDlist]
return
@CJHArch
CJHArch / gist:225cfa99afda9fe73619
Created February 9, 2015 00:47
OAI reporting - proof of concept
#!/bin/bash
# This script will pull the oai feed, assign it a name based on the time and date, pull out requested data, and
mkdir $(date +%Y%m%d)
cd $(date +%Y%m%d)
echo "Files found in" $(date +%Y%m%d)
#run pythonaoi to get feed
python /home/kevin/test/pyoaiharvest.py -l http://digital.cjh.org/OAI-PUB -o LBI_periodicals$(date +%Y%m%d).xml -m marc21 -s LBI_periodicals
# sed to remove marc namespace prefix, consider using the sed 'or' to clean up other stuff in one shot
@CJHArch
CJHArch / gist:3b21aa3c826ef8e4e305
Created February 9, 2015 22:00
XSLT templates to grab various specific records from the OAI feed
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<!--List of PIDS -->
<xsl:template match="/">
<xsl:for-each select="results/record/header/identifier">
<xsl:copy-of select="./text()"/>,
</xsl:for-each>
</xsl:template>
@CJHArch
CJHArch / OAIsOutviaPIDList
Created February 9, 2015 22:01
This xquery looks at a list of PIDs and grabs the associated records out of the OAI feed. There is probably a more efficient way to do this, but it works.
xquery version "3.0";
<results>
{
for $PIDlist in doc('OH_PIDS_XML.xml')/data/pid/text()
let $OAIRecord := repository/record[header/identifier/substring-after(., "oai:digital.cjh.org:") = $PIDlist]
return
@CJHArch
CJHArch / Pull655s
Created February 27, 2015 20:25
This XSLT will take a list of PIDs, turn them into file names, and then look at those EAD files and pull out the genreforms.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs"
version="2.0">
<xsl:template match="/">
<xsl:for-each select="document('PIDsforEADs.xml')/record/pid">
<record>
<xsl:variable name="PID" select="."></xsl:variable>