Skip to content

Instantly share code, notes, and snippets.

View edsu's full-sized avatar

Ed Summers edsu

View GitHub Profile
import csv
import sys
from itertools import batched
import pyarrow
from pyarrow.parquet import ParquetWriter
csv.field_size_limit(sys.maxsize)
def csv_to_parquet(csv_file, parquet_file, batch_size=10_000):
@edsu
edsu / .gitignore
Last active July 18, 2024 14:48
A sloppy prototype for moving browsertrix WACZs to AWS S3.
.env
import requests
author_id = 'https://openalex.org/A5067004024'
url = 'https://api.openalex.org/works'
params = {
'filter': f'author.id:{author_id}',
'cursor': '*'
}
#!/usr/bin/env python3
"""
Run this program with an institution name and see the institutions and the count
of publications in OpenAlex.
$ ./openalex_counts "stanford"
Stanford University (I97018004): 430550
Stanford Medicine (I4210137306): 32576
@edsu
edsu / en.wav
Last active March 29, 2024 12:22
This seems to cause whisper to segfault on my MacBook Pro 2.4 GHz 8-Core Intel Core i9, Sonoma 14.4.1, Python 3.12.0
@edsu
edsu / response.json
Last active March 20, 2024 16:54
Looking at the HTTP request that happens when you click on a citation link in a PDF when using Google Scholar's PDF extension for Chrome. You will need to be logged into Google to see the response, which comes back with the wrong Content-Type: https://scholar.google.com/scholar?oi=gsr-r&q=Ben-David%20A%20and%20Amram%20A%20(2018)%20The%20internet…
{
"l": "1",
"p": "https://lh3.googleusercontent.com/-XdUIqdMkCWA/AAAAAAAAAAI/AAAAAAAAAAA/4252rscbv5M/s64-c-mo/photo.jpg",
"r": [
{
"t": "The Internet Archive and the socio-technical construction of historical facts",
"u": "https://scholar.google.com/scholar_url?url=https://www.tandfonline.com/doi/abs/10.1080/24701475.2018.1455412&hl=en&sa=T&oi=gsr-r&ct=res&cd=0&d=3272375975175528132&ei=YBH7ZeXNA4Cb6rQPmrOdoA8&scisig=AFWwaeb_dRhXurIfWX0NXA2y4G9I",
"x": "",
"m": "A Ben-David, A Amram - Internet Histories, 2018",
"s": "This article analyses the socio-technical epistemic processes behind the construction of historical facts by the Internet Archive Wayback Machine (IAWM). Grounded in theoretical debates in Science and Technology Studies about digital and algorithmic platforms as “black boxes”, this article uses provenance information and other data traces provided by the IAWM to uncover specific epistemic processes embedded at its back-end, through a case study on the archiv
filename count
data.zip 22397
data_EPSG_4326.zip 22397
preview.jpg 22397
index_map.json 147
Beechey_WGS.tif.xml 1
Beechey_WGS-iso19139.xml 1
Beechey_WGS-fgdc.xml 1
bathy20.txt 1
@edsu
edsu / lcauthority.py
Last active February 16, 2024 22:23
Get some usable JSON for a given LC name or subject authority string: e.g. `./lcauthority.py "Southampton (England)"`
#!/usr/bin/env python3
"""
A small command line tool to get the JSON-LD for a Library of Congress authority
record by first looking up the authority as a string using the label lookup
service and then getting the JSON-LD for the authority and writing it out using
a JSON-LD frame where the SKOS is the default vocabulary.
"""
import sys
@edsu
edsu / guess_doi.py
Last active January 10, 2024 17:54
Use the CrossRef API to guess the DOI for a given title
#!/usr/bin/env python3
import sys
import requests
title = sys.argv[1]
api_url = "https://api.crossref.org/works"
response = requests.get(api_url, params={"query.title": title})
@edsu
edsu / 2023-12-20.txt
Last active December 20, 2023 16:33
A count of albums in the lists at https://aoty.hubmed.org/ for 2023
[13] Sufjan Stevens - Javelin [Clash, The Fader, PopMatters, Pitchfork, The Line of Best Fit, Consequence, Rolling Stone, Exclaim, Paste, Mojo, Uncut, Piccadilly Records, Rough Trade]
[12] Kelela - Raven [Clash, The Fader, The Forty-Five, PopMatters, Pitchfork, Crack, The Line of Best Fit, Consequence, Rolling Stone, Exclaim, Paste, The Quietus]
[12] Wednesday - Rat Saw God [Clash, The Fader, The Forty-Five, PopMatters, Pitchfork, The Line of Best Fit, Consequence, Rolling Stone, Exclaim, Paste, Uncut, Rough Trade]
[11] Noname - Sundial [Clash, The Fader, The Forty-Five, The Wire, PopMatters, Pitchfork, Crack, The Line of Best Fit, Rolling Stone, Paste, The Quietus]
[9] Mitski - The Land Is Inhospitable and So Are We [Clash, The Fader, PopMatters, Pitchfork, The Line of Best Fit, Consequence, Rolling Stone, Exclaim, Mojo]
[9] Lankum - False Lankum [Clash, Concrete Islands, Crack, The Line of Best Fit, Fast 'n' Bulbous, Louder Than War, Mojo, Uncut, The Quietus]
[8] Amaarae - Fountain Baby [Clash, The Fader, T