Skip to content

Instantly share code, notes, and snippets.

View edsu's full-sized avatar

Ed Summers edsu

View GitHub Profile
@edsu
edsu / bagit.sh
Last active April 18, 2024 19:57
Remembering the original spirit of BagIt. https://twitter.com/justin_littman/status/778561421428793344
#!/bin/bash
#
# The simplest way to create a valid BagIt bag?
#
# Usage: bagit.sh <dir_to_bag> <bag_dir>
#
# Note: you'll need to have md5deep installed:
# brew install md5deep
# apt-get install md5deep
@edsu
edsu / en.wav
Last active March 29, 2024 12:22
This seems to cause whisper to segfault on my MacBook Pro 2.4 GHz 8-Core Intel Core i9, Sonoma 14.4.1, Python 3.12.0
@edsu
edsu / response.json
Last active March 20, 2024 16:54
Looking at the HTTP request that happens when you click on a citation link in a PDF when using Google Scholar's PDF extension for Chrome. You will need to be logged into Google to see the response, which comes back with the wrong Content-Type: https://scholar.google.com/scholar?oi=gsr-r&q=Ben-David%20A%20and%20Amram%20A%20(2018)%20The%20internet…
{
"l": "1",
"p": "https://lh3.googleusercontent.com/-XdUIqdMkCWA/AAAAAAAAAAI/AAAAAAAAAAA/4252rscbv5M/s64-c-mo/photo.jpg",
"r": [
{
"t": "The Internet Archive and the socio-technical construction of historical facts",
"u": "https://scholar.google.com/scholar_url?url=https://www.tandfonline.com/doi/abs/10.1080/24701475.2018.1455412&hl=en&sa=T&oi=gsr-r&ct=res&cd=0&d=3272375975175528132&ei=YBH7ZeXNA4Cb6rQPmrOdoA8&scisig=AFWwaeb_dRhXurIfWX0NXA2y4G9I",
"x": "",
"m": "A Ben-David, A Amram - Internet Histories, 2018",
"s": "This article analyses the socio-technical epistemic processes behind the construction of historical facts by the Internet Archive Wayback Machine (IAWM). Grounded in theoretical debates in Science and Technology Studies about digital and algorithmic platforms as “black boxes”, this article uses provenance information and other data traces provided by the IAWM to uncover specific epistemic processes embedded at its back-end, through a case study on the archiv
filename count
data.zip 22397
data_EPSG_4326.zip 22397
preview.jpg 22397
index_map.json 147
Beechey_WGS.tif.xml 1
Beechey_WGS-iso19139.xml 1
Beechey_WGS-fgdc.xml 1
bathy20.txt 1
@edsu
edsu / wacz-images.py
Last active February 19, 2024 03:08
#!/usr/bin/env python3
#
# usage: wacz-images.py <wacz_file>
#
# This program will extract images from the WARC files contained in a WACZ
# file and write them to the current working directory using the image's URL
# as a file location.
#
# You will need to `pip install warcio` for it to work.
@edsu
edsu / lcauthority.py
Last active February 16, 2024 22:23
Get some usable JSON for a given LC name or subject authority string: e.g. `./lcauthority.py "Southampton (England)"`
#!/usr/bin/env python3
"""
A small command line tool to get the JSON-LD for a Library of Congress authority
record by first looking up the authority as a string using the label lookup
service and then getting the JSON-LD for the authority and writing it out using
a JSON-LD frame where the SKOS is the default vocabulary.
"""
import sys
@edsu
edsu / mix.sh
Last active January 28, 2024 17:10
Concat two mp4 files with different resolutions.
# concatenate two videos with different resolution
ffmpeg -i part1.mp4 -i part2.mp4 -filter_complex "[0]scale=1280:720:force_original_aspect_ratio=decrease,pad=1280:720:(ow-iw)/2:(oh-ih)/2,setsar=1[v0];[1]scale=1280:720:force_original_aspect_ratio=decrease,pad=1280:720:(ow-iw)/2:(oh-ih)/2,setsar=1[v1];[v0][0:a:0][v1][1:a:0]concat=n=2:v=1:a=1[v][a]" -map "[v]" -map "[a]" out.mp4
@edsu
edsu / guess_doi.py
Last active January 10, 2024 17:54
Use the CrossRef API to guess the DOI for a given title
#!/usr/bin/env python3
import sys
import requests
title = sys.argv[1]
api_url = "https://api.crossref.org/works"
response = requests.get(api_url, params={"query.title": title})
@edsu
edsu / 2023-12-20.txt
Last active December 20, 2023 16:33
A count of albums in the lists at https://aoty.hubmed.org/ for 2023
[13] Sufjan Stevens - Javelin [Clash, The Fader, PopMatters, Pitchfork, The Line of Best Fit, Consequence, Rolling Stone, Exclaim, Paste, Mojo, Uncut, Piccadilly Records, Rough Trade]
[12] Kelela - Raven [Clash, The Fader, The Forty-Five, PopMatters, Pitchfork, Crack, The Line of Best Fit, Consequence, Rolling Stone, Exclaim, Paste, The Quietus]
[12] Wednesday - Rat Saw God [Clash, The Fader, The Forty-Five, PopMatters, Pitchfork, The Line of Best Fit, Consequence, Rolling Stone, Exclaim, Paste, Uncut, Rough Trade]
[11] Noname - Sundial [Clash, The Fader, The Forty-Five, The Wire, PopMatters, Pitchfork, Crack, The Line of Best Fit, Rolling Stone, Paste, The Quietus]
[9] Mitski - The Land Is Inhospitable and So Are We [Clash, The Fader, PopMatters, Pitchfork, The Line of Best Fit, Consequence, Rolling Stone, Exclaim, Mojo]
[9] Lankum - False Lankum [Clash, Concrete Islands, Crack, The Line of Best Fit, Fast 'n' Bulbous, Louder Than War, Mojo, Uncut, The Quietus]
[8] Amaarae - Fountain Baby [Clash, The Fader, T
@edsu
edsu / json_shapes.py
Last active November 17, 2023 23:22
Feed in some JSONL and get a report of the patterns present in the data.
#!/usr/bin/env python3
import csv
import json
from collections import OrderedDict
from collections import Counter
def trace(data, shape=None):
if isinstance(data, dict):
new_dict = OrderedDict()