Skip to content

Instantly share code, notes, and snippets.

Jon Stroop jpstroop

  • Princeton University Library
  • Princeton, NJ
Block or report user

Report or block jpstroop

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@jpstroop
jpstroop / orient_image.sh
Last active May 22, 2018
Guess the orientation of an image using OCR and Spellcheck
View orient_image.sh
#!/bin/bash
# Script takes a single argument, which is a path in an image file.
# NOTE: this file will be replaced with the version that this script deems to be 'correct'
file=$1
TMP="/tmp/pulfa/img_harvester/rotation-calc"
# Clean up if there are files from the last run
# (leaving them around is handy for debugging)
if [ -d $TMP ]; then
View kill_double_spaces.sh
#!/bin/bash
for md_file in $(find . -name "*.md"); do
gsed -r -i -e 's/([A-z])\.\s{2,3}([A-Z])/\1. \2/g' $md_file
done
@jpstroop
jpstroop / map.json
Last active Oct 12, 2017
IIIF canvasMap proposal
View map.json
{
"@context": "http://iiif.io/api/presentation/2/context.json",
"id": "https://plum.princeton.edu/concern/scanned_resources/pt722jw092/manifest",
"type": "sc:Manifest",
"label": [
"Reports of the Princeton University expeditions to Patagonia, 1896-1899 : J. B. Hatcher in charge"
],
"viewingHint": "paged",
"viewingDirection": "left-to-right",
"rendering": {
@jpstroop
jpstroop / unwatch_all.py
Last active Nov 4, 2019
Unwatch all Github repositories.
View unwatch_all.py
#!/usr/bin/env python
#
# Unwatch from all Github repositories. Note that it will only work with up to
# 100 repos at a time (pagination is not implemented), so you may need to run
# more than once.
#
# Depends:
# requests : http://docs.python-requests.org/en/master/
#
# Output (to stdout):
View beach_image_workflow_proposal.md

Deprecated. Moved here: https://docs.google.com/document/d/1GY9_CfvFb5WCFOoSQ54DwzoMKPXmh5UrBgBuN5QXjUg/edit#

Proposed Flow

  1. Talk to RBSC cat.
  2. Enhace EAD w/ item level data from TEI. Including pudl0123 IDs temporarily
  3. Use TEI to EAD mapping to reorg. images to match new EAD (item level) components
  4. Remove temporary pudl0123 IDs from new EAD components
  5. Generate new PULFA METS and load EAD
  6. Migration New PULFA METS to Plum - Make sure EAD component ID is in dc:replaces
  7. Map Plum manifest URIs back to TEI
@jpstroop
jpstroop / iiif_agg_disc_import.md
Created Nov 13, 2016
IIIF Discovery Strawperson
View iiif_agg_disc_import.md

IIIF Aggregation, Discovery, and Import

Introduction

Audience and Scope

  • Carefully acknowledge somehow that this extends the scope of IIIF, but is at least controlled mission creep (i.e. similar to a controlled burn :-))

  • Use cases:

    • thematic registries and portals
    • ad-hoc reuse/remixing/mashups
@jpstroop
jpstroop / get_exif.py
Created Oct 23, 2016
Get EXIF, IPTC Metadata, etc. with Python & PIllow
View get_exif.py
from PIL import Image
from PIL.ExifTags import GPSTAGS
from PIL.ExifTags import TAGS
# Keys are listed here:
# https://github.com/python-pillow/Pillow/blob/master/PIL/ExifTags.py
def _map_key(k):
try:
return TAGS[k]
@jpstroop
jpstroop / kakadu_vs_opj_reduce.txt
Created Oct 15, 2016
Proof that OpenJPEG and Kakadu both round up when discarding resolution levels
View kakadu_vs_opj_reduce.txt
$ kdu_expand -quiet -i 0001.jp2 -o 0001.bmp -reduce 0; identify 0001.bmp
0001.bmp BMP3 5906x7200 5906x7200+0+0 8-bit sRGB 127.6MB 0.120u 0:00.129
$ opj_decompress -i 0001.jp2 -o 0001.bmp -r 0; identify 0001.bmp
0001.bmp BMP3 5906x7200 5906x7200+0+0 8-bit sRGB 127.6MB 0.120u 0:00.129
$ kdu_expand -quiet -i 0001.jp2 -o 0001.bmp -reduce 1; identify 0001.bmp
0001.bmp BMP3 2953x3600 2953x3600+0+0 8-bit sRGB 31.9MB 0.030u 0:00.040
$ opj_decompress -i 0001.jp2 -o 0001.bmp -r 1; identify 0001.bmp
0001.bmp BMP3 2953x3600 2953x3600+0+0 8-bit sRGB 31.9MB 0.030u 0:00.040
View jp2_dims.py
#!/usr/bin/env python3
from io import open
from collections import deque
IHDR = b'\x69\x68\x64\x72' # See I.5.3.1 Image Header box
with open('tests/fixtures/images/color.jp2', 'rb') as jp2:
window = deque([], 4)
while bytes(b''.join(window)) != IHDR:
window.append(jp2.read(1))
# The image header box is the only place in the spec where height comes
# first. Go figure.
View collections_hash.xq
xquery version "1.0";
declare namespace ead="urn:isbn:1-931666-22-9";
declare option saxon:output 'omit-xml-declaration=yes';
declare variable $collections := collection('/home/jstroop/eclipse_workspace/pudl-data-sparse/mdata/collections?select=*.ead');
declare function local:normalize($str) {
let $norm := replace(normalize-space($str), '"', '\\"')
return $norm
};
You can’t perform that action at this time.