Skip to content

Instantly share code, notes, and snippets.

@jeffgerhard
jeffgerhard / delete599s.py
Last active August 2, 2017 20:02
delete 599 and 910 fields from mrc records via pymarc
# delete 599 and 910 fields from a current directory full of individual mrc files
# (python 3 script)
# pymarc: https://github.com/edsu/pymarc
# help found at https://gist.github.com/symac/0133cae8846c134e849c and https://gist.github.com/EG5h/0ac510047f39e0cf930a
import os
import glob
import pymarc
errorlist = ""
path = os.path.dirname(os.path.abspath(__file__)) + '\\'
@jeffgerhard
jeffgerhard / z3950_from_spreadsheet.py
Last active May 24, 2016 19:03
batch run z39.50 query and rename results per spreadsheet values
# adapted from https://github.com/asl2/PyZ3950/blob/master/example/zmarc_example.py
# for python2 [i couldn't get PyZ3950 to work in 3]
# this is a basic code to:
# - read bib record numbers from a csv spreadsheet 'worksheet.csv';
# - run z39.50 query and save MARC records locally; and
# - rename them according to the second column in spreadsheet
#
# PyZ3950 found at https://github.com/asl2/PyZ3950
# and requires PLY found at http://www.dabeaz.com/ply/
#
@jeffgerhard
jeffgerhard / preparePDFupload.py
Created May 27, 2016 16:41
prepare PDFs and matching MARC records for uploading to the Internet Archive
# ##################################################
#
# python3 program to prepare for Internet Archive upload of PDFs and MARC records
# this will do the following:
# 1. create a list of identifiers from a folder of PDFs;
# 2. make sure MARC records exist with the same identifiers (stopping on mismatches)
# 3. pull MARC records into the same directory and rename them to IA convention
# 4. generate a spreadsheet called upload.csv to use with IA command-line tool
#
# TO DO:
@jeffgerhard
jeffgerhard / ia_download_and_bag.py
Last active August 2, 2016 15:44
first step towards a local mirroring of internet archive content [in progress]
# for python 3.5
# takes a list of identifiers and exports a csv containing various metadata
# and status info from archive.org
#
# for use with IA lists scraped from catalogs of the users and the TT-Scribe
# (can also work with lists sent to Jye if we want to see how many are still
# not scanned)
#
# in the future maybe I can figure out how to take the results and auto-update the Access DB
# could clean up/simplify the code for sure
@jeffgerhard
jeffgerhard / cleanupdirs.py
Created June 21, 2016 19:36
workflow (in progress) for clearing out directories of cr2s that have been processed into archival jp2s and can be safely deleted from storage
import os
from tkinter.filedialog import askdirectory
from tkinter import messagebox
from sys import exit
from shutil import rmtree
def get_immediate_subdirectories(a_dir):
# stackoverflow.com/questions/800197/how-to-get-all-of-the-immediate-subdirectories-in-python#800201
return [name for name in os.listdir(a_dir)
if os.path.isdir(os.path.join(a_dir, name))]
@jeffgerhard
jeffgerhard / grab_IA_file_info.py
Created August 1, 2016 19:52
utility to quickly pull information for one IA item (size, checksums, metadata, scan info, dates, etc)
from internetarchive import get_item
import pycurl
import json
from io import BytesIO
def getFileInfo(x):
info = 'n/a'
files = get_item(x).files
for z in files:
@jeffgerhard
jeffgerhard / bagit_script_for_current_directory.bat
Created August 2, 2016 15:41
batch script to bag-in-place all the subdirectories of the current directory, and list the file ids
REM first make a list of the directories
for /d %%A in (*) do echo %%~A>>temp.txt
REM then send each directory to bagit to baginplace
forfiles /C "cmd /c if @isdir==TRUE cd /d C:\bagit-4.9.0\bin\ & bag baginplace @path --payloadmanifestalgorithm sha1 --tagmanifestalgorithm sha1 --version 0.97 --verbose & move /-y @path Z:\archival_master_files\books\general_collection"
REM oh and then edit the files list to drop the "master" part
REM this find/replace code via https://social.technet.microsoft.com/Forums/scriptcenter/en-US/57bd676c-e5c3-4829-bdbf-6addea238bf0/find-and-replace-string-in-a-file-using-batch-script?forum=ITCG
@echo off
set "textfile=temp.txt"
set "newfile=filelist.txt"
set Scr="%temp%\TempVBS.vbs"
# cleans up the downloaded, bagged masters
# and moves to archival storage
#
# could combine this with the IA download script?
#
# leaves behind a log file with the id's to plug them into the Access db
import os
import shutil
from tkinter.filedialog import askdirectory
@jeffgerhard
jeffgerhard / compare_and_replace.py
Last active March 30, 2017 14:05
compare two directories and optionally replace one's content (i.e., from a backup)
'''
compare two directories recursively and optionally
replace directories' contents with the other
use case: replacing files from an old backup for
a cloned hard drive that has unpredictably corrupt files
'''
import os
@jeffgerhard
jeffgerhard / fd2002_nooverlap_curved.png
Last active April 26, 2017 20:29
images for fd project
fd2002_nooverlap_curved.png