Skip to content

Instantly share code, notes, and snippets.

@jeffgerhard
jeffgerhard / grab_IA_file_info.py
Created August 1, 2016 19:52
utility to quickly pull information for one IA item (size, checksums, metadata, scan info, dates, etc)
from internetarchive import get_item
import pycurl
import json
from io import BytesIO
def getFileInfo(x):
info = 'n/a'
files = get_item(x).files
for z in files:
@jeffgerhard
jeffgerhard / cleanupdirs.py
Created June 21, 2016 19:36
workflow (in progress) for clearing out directories of cr2s that have been processed into archival jp2s and can be safely deleted from storage
import os
from tkinter.filedialog import askdirectory
from tkinter import messagebox
from sys import exit
from shutil import rmtree
def get_immediate_subdirectories(a_dir):
# stackoverflow.com/questions/800197/how-to-get-all-of-the-immediate-subdirectories-in-python#800201
return [name for name in os.listdir(a_dir)
if os.path.isdir(os.path.join(a_dir, name))]
@jeffgerhard
jeffgerhard / ia_download_and_bag.py
Last active August 2, 2016 15:44
first step towards a local mirroring of internet archive content [in progress]
# for python 3.5
# takes a list of identifiers and exports a csv containing various metadata
# and status info from archive.org
#
# for use with IA lists scraped from catalogs of the users and the TT-Scribe
# (can also work with lists sent to Jye if we want to see how many are still
# not scanned)
#
# in the future maybe I can figure out how to take the results and auto-update the Access DB
# could clean up/simplify the code for sure
@jeffgerhard
jeffgerhard / preparePDFupload.py
Created May 27, 2016 16:41
prepare PDFs and matching MARC records for uploading to the Internet Archive
# ##################################################
#
# python3 program to prepare for Internet Archive upload of PDFs and MARC records
# this will do the following:
# 1. create a list of identifiers from a folder of PDFs;
# 2. make sure MARC records exist with the same identifiers (stopping on mismatches)
# 3. pull MARC records into the same directory and rename them to IA convention
# 4. generate a spreadsheet called upload.csv to use with IA command-line tool
#
# TO DO:
@jeffgerhard
jeffgerhard / z3950_from_spreadsheet.py
Last active May 24, 2016 19:03
batch run z39.50 query and rename results per spreadsheet values
# adapted from https://github.com/asl2/PyZ3950/blob/master/example/zmarc_example.py
# for python2 [i couldn't get PyZ3950 to work in 3]
# this is a basic code to:
# - read bib record numbers from a csv spreadsheet 'worksheet.csv';
# - run z39.50 query and save MARC records locally; and
# - rename them according to the second column in spreadsheet
#
# PyZ3950 found at https://github.com/asl2/PyZ3950
# and requires PLY found at http://www.dabeaz.com/ply/
#
@jeffgerhard
jeffgerhard / delete599s.py
Last active August 2, 2017 20:02
delete 599 and 910 fields from mrc records via pymarc
# delete 599 and 910 fields from a current directory full of individual mrc files
# (python 3 script)
# pymarc: https://github.com/edsu/pymarc
# help found at https://gist.github.com/symac/0133cae8846c134e849c and https://gist.github.com/EG5h/0ac510047f39e0cf930a
import os
import glob
import pymarc
errorlist = ""
path = os.path.dirname(os.path.abspath(__file__)) + '\\'