Skip to content

Instantly share code, notes, and snippets.

@kschlottmann
kschlottmann / ead_lanq.xq
Created July 13, 2023 18:05
Collection info from EAD based on language
<data>
{
for $Record in /ead
where $Record/archdesc/did/langmaterial/language/@langcode[not(contains(., 'eng'))]
let $id := $Record/archdesc/did/unitid[1]/text()
let $title := $Record/archdesc/did/unittitle
let $repo := $Record/eadheader/eadid/@mainagencycode
let $lang := $Record/archdesc/did/langmaterial/language/text()
@kschlottmann
kschlottmann / title_element.xq
Created May 25, 2023 14:35
this will find <title> elements without a render attribute in EAD
<data>
{
for $Record in /ead
where $Record/archdesc/dsc//title[not(@render)]
let $id := $Record/archdesc/did/unitid[1]/text()
let $title := $Record/archdesc/did/unittitle
let $repo := $Record/eadheader/eadid/@mainagencycode
@kschlottmann
kschlottmann / fix_file_directory_names.ps1
Last active February 28, 2024 18:28
Powershell one-liners for cleaning up file and directory names
(Get-ChildItem -Recurse -File) |
Where-Object { $_.Name -match '[^a-zA-Z0-9.-]' } |
Rename-Item -NewName { $_.Name -replace '[^a-zA-Z0-9.-]+', '_' } -WhatIf
# As written, this will preview a list of changes where non-ASCII characters as described by the regex are replaced by an underscore
# Delete line 4 (-WhatIf) and re-run to actually make the changes
# To act on directories: in line 1, replace -File with -Directory, and replace regex with [^a-zA-Z0-9] so as to capture periods
# In lines 2 and 3, replace regex with any other character or regex that is desired, e.g. ' ' for replacing only spaces
#Be sure to use the correct regex for files and directories, because we do not want to rename hidden files starting with a period (e.g. .DS_store), because they are otherwise hard to find programatically
@kschlottmann
kschlottmann / marc_word_list.xq
Created May 12, 2021 21:46
This xquery will match all records containing an arbitrary term, from the raw ArchivesSpace MARC output from the OAI feed
xquery version "3.0";
declare namespace marc="http://www.loc.gov/MARC21/slim";
(: This xquery will match all records containing an arbitrary term, from the raw ArchivesSpace MARC output from the OAI feed :)
<results>
{
for $MarcRecord in /repository/record/metadata/collection/record
for $word in ("alien", "Alien", "Aliens", "aliens")
@kschlottmann
kschlottmann / publish_status.xsl
Created June 9, 2020 15:11
Get box/folder/title/date and publish status from an AS EAD (from a given series, only c-level data)
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:ead="urn:isbn:1-931666-22-9"
exclude-result-prefixes="xs " version="2.0">
<xsl:template match="c">
<xsl:text>Title|Date|Box|Folder|Publish Status &#10;</xsl:text>
<xsl:apply-templates select="//c[@level = 'file']"/>
@kschlottmann
kschlottmann / allRbml.py
Created May 29, 2020 00:25
this script will interate over the entire CLIO corpus and return all RBML records as individual marcxml records with bib id as filename
import xmltodict, json
from timeit import default_timer as timer
import os
import sys
import datetime
#this script will interate over the entire CLIO corpus and return all RBML records as individual marcxml records with bib id as filename
#this function wraps the json in a dict with a record key, and casts it to an individual marcxml record
def write_marcxml_record(record):
@kschlottmann
kschlottmann / getnamesfromnumbers.py
Last active May 28, 2020 16:51
get authorized name from lc json based on authority file number
import requests, csv, json, urllib, time
startTime = time.time()
baseURLexact = 'http://id.loc.gov/authorities/names/'
#http://id.loc.gov/authorities/names/nr2002027244.json
with open('input_numbers.csv', 'r') as csvfile:
reader = csv.reader(csvfile, delimiter=',', quotechar='"')
for row in reader:
number = str(row[0])
@kschlottmann
kschlottmann / ead_mrs_top-level-scope.xq
Last active September 11, 2020 20:49
Get description with 'Mrs' from EAD
<data>
{
for $Record in /ead
where $Record/archdesc/scopecontent/p[contains(., 'Mrs')]
let $id := $Record/archdesc/did/unitid[1]/text()
let $title := $Record/archdesc/did/unittitle
let $repo := $Record/eadheader/eadid/@mainagencycode
let $scopeMrs := $Record/archdesc/scopecontent/p[contains(., 'Mrs')]
@kschlottmann
kschlottmann / leader_search.py
Last active May 24, 2020 12:54
Iterates over many large MARCXML collection files, and pulls out certain fields using xmltodict streaming, based on matching certain holdings
import xmltodict, json
from timeit import default_timer as timer
import os
import sys
import datetime
#this script will interate over the entire CLIO corpus and return the leaders, 245$a s, and 035s for all RBML records
#this function returns a record
def handle_record(_, record):
@kschlottmann
kschlottmann / ead_series_titledates.xq
Created May 4, 2020 21:55
xquery to retrieve series level titles with the string '19' (proxy for selecting finding aids with dates in unittitles)
<data>
{
for $Record in /ead
where $Record/archdesc/dsc//c[@level='series']/did/unittitle[contains(., '19')]
let $id := $Record/archdesc/did/unitid[1]/text()
let $title := $Record/archdesc/did/unittitle
let $repo := $Record/eadheader/eadid/@mainagencycode