Skip to content

Instantly share code, notes, and snippets.

View edsu's full-sized avatar

Ed Summers edsu

View GitHub Profile
@edsu
edsu / en.wav
Last active March 29, 2024 12:22
This seems to cause whisper to segfault on my MacBook Pro 2.4 GHz 8-Core Intel Core i9, Sonoma 14.4.1, Python 3.12.0
@edsu
edsu / response.json
Last active March 20, 2024 16:54
Looking at the HTTP request that happens when you click on a citation link in a PDF when using Google Scholar's PDF extension for Chrome. You will need to be logged into Google to see the response, which comes back with the wrong Content-Type: https://scholar.google.com/scholar?oi=gsr-r&q=Ben-David%20A%20and%20Amram%20A%20(2018)%20The%20internet…
{
"l": "1",
"p": "https://lh3.googleusercontent.com/-XdUIqdMkCWA/AAAAAAAAAAI/AAAAAAAAAAA/4252rscbv5M/s64-c-mo/photo.jpg",
"r": [
{
"t": "The Internet Archive and the socio-technical construction of historical facts",
"u": "https://scholar.google.com/scholar_url?url=https://www.tandfonline.com/doi/abs/10.1080/24701475.2018.1455412&hl=en&sa=T&oi=gsr-r&ct=res&cd=0&d=3272375975175528132&ei=YBH7ZeXNA4Cb6rQPmrOdoA8&scisig=AFWwaeb_dRhXurIfWX0NXA2y4G9I",
"x": "",
"m": "A Ben-David, A Amram - Internet Histories, 2018",
"s": "This article analyses the socio-technical epistemic processes behind the construction of historical facts by the Internet Archive Wayback Machine (IAWM). Grounded in theoretical debates in Science and Technology Studies about digital and algorithmic platforms as “black boxes”, this article uses provenance information and other data traces provided by the IAWM to uncover specific epistemic processes embedded at its back-end, through a case study on the archiv
filename count
data.zip 22397
data_EPSG_4326.zip 22397
preview.jpg 22397
index_map.json 147
Beechey_WGS.tif.xml 1
Beechey_WGS-iso19139.xml 1
Beechey_WGS-fgdc.xml 1
bathy20.txt 1
@edsu
edsu / lcauthority.py
Last active February 16, 2024 22:23
Get some usable JSON for a given LC name or subject authority string: e.g. `./lcauthority.py "Southampton (England)"`
#!/usr/bin/env python3
"""
A small command line tool to get the JSON-LD for a Library of Congress authority
record by first looking up the authority as a string using the label lookup
service and then getting the JSON-LD for the authority and writing it out using
a JSON-LD frame where the SKOS is the default vocabulary.
"""
import sys
@edsu
edsu / guess_doi.py
Last active January 10, 2024 17:54
Use the CrossRef API to guess the DOI for a given title
#!/usr/bin/env python3
import sys
import requests
title = sys.argv[1]
api_url = "https://api.crossref.org/works"
response = requests.get(api_url, params={"query.title": title})
@edsu
edsu / 2023-12-20.txt
Last active December 20, 2023 16:33
A count of albums in the lists at https://aoty.hubmed.org/ for 2023
[13] Sufjan Stevens - Javelin [Clash, The Fader, PopMatters, Pitchfork, The Line of Best Fit, Consequence, Rolling Stone, Exclaim, Paste, Mojo, Uncut, Piccadilly Records, Rough Trade]
[12] Kelela - Raven [Clash, The Fader, The Forty-Five, PopMatters, Pitchfork, Crack, The Line of Best Fit, Consequence, Rolling Stone, Exclaim, Paste, The Quietus]
[12] Wednesday - Rat Saw God [Clash, The Fader, The Forty-Five, PopMatters, Pitchfork, The Line of Best Fit, Consequence, Rolling Stone, Exclaim, Paste, Uncut, Rough Trade]
[11] Noname - Sundial [Clash, The Fader, The Forty-Five, The Wire, PopMatters, Pitchfork, Crack, The Line of Best Fit, Rolling Stone, Paste, The Quietus]
[9] Mitski - The Land Is Inhospitable and So Are We [Clash, The Fader, PopMatters, Pitchfork, The Line of Best Fit, Consequence, Rolling Stone, Exclaim, Mojo]
[9] Lankum - False Lankum [Clash, Concrete Islands, Crack, The Line of Best Fit, Fast 'n' Bulbous, Louder Than War, Mojo, Uncut, The Quietus]
[8] Amaarae - Fountain Baby [Clash, The Fader, T
@edsu
edsu / json_shapes.py
Last active November 17, 2023 23:22
Feed in some JSONL and get a report of the patterns present in the data.
#!/usr/bin/env python3
import csv
import json
from collections import OrderedDict
from collections import Counter
def trace(data, shape=None):
if isinstance(data, dict):
new_dict = OrderedDict()
@edsu
edsu / nytimes-gptbot.sh
Last active September 7, 2023 19:14
Uses the Wayback Machine to show (approximately) when the New York Times started telling OpenAI to stop scraping them.
#!/bin/bash
#
# Use the Internet Archive Wayback Machine to demonstrate roughly when the
# NYTimes started blocking GPTBot.
#
# See: https://www.theverge.com/2023/8/21/23840705/new-york-times-openai-web-crawler-ai-gpt
#
wget -q -O robots-20230817.txt https://web.archive.org/web/20230817012138id_/https://www.nytimes.com/robots.txt
@edsu
edsu / example.md
Last active August 24, 2023 14:31

Org A split off of Org B, Org B split into Org C & Org D, Org A and Org D merged into Org E?

can be turned into Mermaid notation

graph TD;
  B --> A;
  B --> C;
  B --> D;
 A --> E;
This file has been truncated, but you can view the full file.
collection: fatal-encounters
generateWACZ: true
workers: 4
screencastPort: 9037
seeds:
- url: https://fatalencounters.org/
scopeType: prefix
- url: https://www.wsoctv.com/news/1-person-dead-after-attempting-escape-police-troopers-say/QXA244QPUZGJ5GAGRADGDWBAEU/
scopeType: page
- url: https://www.wtok.com/2022/01/01/officer-involved-shooting/