Skip to content

Instantly share code, notes, and snippets.

@saverkamp
saverkamp / lc_vocab_term_harvester.py
Created January 24, 2020 22:45
Get lists of LC vocabulary terms and URIs from id.loc.gov
"""This script traverses all narrower terms of a http://id.loc.gov/ thesaurus
(or all terms of a term list) starting at a given term within the tree (replace
seedterm in the main code block with your URI of choice) and adds the URI and
label to a list. Outputs in CSV and JSON as well as JSONL as patterns for use in
rule-based NER with the NLP tool SpaCy.
(More info at: https://spacy.io/usage/rule-based-matching#entityruler)
NOTE the 5-second rate limit courtesy to the LC servers working hard for your
controlled vocabulary needs (see queryTerms() function). You might get away with
less, but don't be a jerk about it.
@saverkamp
saverkamp / CatcherUploadExample.csv
Created February 24, 2014 22:13
Sample CONTENTdm Catcher upload script. Uses Catcher class from: https://gist.github.com/saverkamp/9197945
We can make this file beautiful and searchable if this error is corrected: Unclosed quoted field in line 2.
Alias,CDM_page_id,CDM_field,Value
byington,3160,transc,"Thursday October 25th 1900 Did two churnings this morning and finished putting the tomato pickle away. I put in all the afternoon doing mending Will busy about the place and Mrs Evans helped Leonard with the corn the new man went back to town tonight. Friday October 26th 1900 Will took the butter to town. I was busy with the work till noon I did a lot of baking. I cut out shirts and sewed all afternoon. Saturday October 27th 1900 I sewed a little and got the dinner be a little after eleven. Will and I went to town in the afternoon to have my teeth finished. It has been a warm week, today was like summer. I had a letter from Mother. Sunday October 28th 1900 I was busy about the house most of the forenoon Leonard and his wife were away all day. It rained some in the forenoon. We were up to Stevens in the afternoon. I spent the evening reading. Monday October 29th 1900 It rained in the morning so Leonard could not husk corn. Will was bus
@saverkamp
saverkamp / catcher.py
Last active October 10, 2022 19:32
Python class to overlay metadata in CONTENTdm via Catcher (edit only). Catcher docs at: http://contentdm.org/help6/addons/catcher.asp Sample script using this class at: https://gist.github.com/saverkamp/9198310
from suds.client import Client
class Catcher(object):
"""A CONTENTdm Catcher session."""
def __init__(self, url=url, user=user, password=password, license=license):
self.transactions = []
self.client = Client('https://worldcat.org/webservices/contentdm/catcher/6.0/CatcherService.wsdl')
self.url = url
self.user = user
self.password = password
@saverkamp
saverkamp / gist:6957798
Last active December 25, 2015 09:59
openrefine-demo_names
Erdrich, Louise Louise Erdrich
Eugenides, Jeffrey Jeffrey Eugenides
Farrakhan, Louis Louis Farrakhan
Fatunde, Tunde Tunde Fatunde
Ames, Jonathan Jonathan Ames
Anshaw, Carol, 1946- Carol Anshaw
Julavits, Heidi Heidi Julavits
Mailer, Norman Norman Mailer
Nissen, Thisbe, 1972- Thisbe Nissen
Solnit, Rebecca Rebecca Solnit
@saverkamp
saverkamp / sample-cdm2scripto.py
Created February 7, 2013 17:53
Sample script to harvest metadata through CONTENTdm v6 API and format as csv for upload into ui-libraries fork of Omeka/Scripto. See ui-libraries/plugin-Scripto for documentation. Uses pycdm, a python library for working with the CONTENTdm v6 API (saverkamp/pycdm).
import codecs
import csv
import datetime
import pycdm
from HTMLParser import HTMLParser
#get input: alias + items to retrieve
alias = raw_input('collection alias: ')
items = raw_input('item identifiers (separate by commas): ')