Skip to content

Instantly share code, notes, and snippets.

View edsu's full-sized avatar

Ed Summers edsu

View GitHub Profile
@edsu
edsu / find.rb
Last active June 8, 2022 14:42
An example of using an enumerable with parallel, but which gets flattened into a list by parallel prior to processing.
require 'pathname'
require 'parallel'
# a directory to traverse
dir = ARGV[0]
# files is an Enumerator
files = Pathname.new(dir).find
results = Parallel.map(files, processes: 3) do |f|
@edsu
edsu / x.rb
Last active May 25, 2022 21:06
def stuff()
yield 1
yield 2
yield 3
yield 4
yield 5
end
stuff.take(2).each do |i|
puts i
@edsu
edsu / gif
Last active May 24, 2022 23:29
#!/bin/sh
# Turn a video file into an animated GIF
USAGE="usage: gif video_file [gif_file]"
video_file=$1
if [ "$video_file" = "" ]; then
echo $USAGE
#!/usr/bin/env python3
# This is an example of seeing what unique HTML webpages there are in the
# Wayback Machine for the http://myshtetl.org/ website after 2022-03-01.
from wayback import WaybackClient
wb = WaybackClient()
pages = set()
@edsu
edsu / README.md
Last active April 14, 2022 19:43
Debugging PyWB and Wayback

I'm trying to figure out why this JavaScript file rendered through PyWB seems to throw a Uncaught SyntaxError: missing formal parameter in Firefox and a Uncaught SyntaxError: Unexpected token 'function' (at pywb.js:15:5628639) in Chrome whereas it works fine when rendered through Archive-It Wayback.

curl http://localhost:8080/sul/20220225003837js_/https://prod.smassets.net/assets/anweb/anweb-shared-page-summary-bundle-min.58b903b5.js > pywb.js

curl https://wayback.archive-it.org/18713/20220225003837js_/https://prod.smassets.net/assets/anweb/anweb-shared-page-summary-bundle-min.58b903b5.js > wayback.js

You can open wayback.html and pywb.html in your browser and look at the developer console to see the error in the case of pywb.html.

@edsu
edsu / novaya-gazeta.csv
Last active March 28, 2022 13:42
Novaya Gazeta articles that have been removed due to censorship. See https://github.com/edsu/notebooks/blob/master/Novaya%20Gazeta.ipynb
We can make this file beautiful and searchable if this error is corrected: It looks like row 5 should actually have 3 columns, instead of 1. in line 4.
url,wayback_url,title
https://novayagazeta.ru/articles/2022/03/01/golos-krovi-brata-tvoego-vopiet-ko-mne-ot-zemli-sviashchenniki-russkoi-pravoslavnoi-tserkvi-prizvali-prekratit-voinu-s-ukrainoi-news,http://web.archive.org/web/20220301104646/https://novayagazeta.ru/articles/2022/03/01/golos-krovi-brata-tvoego-vopiet-ko-mne-ot-zemli-sviashchenniki-russkoi-pravoslavnoi-tserkvi-prizvali-prekratit-voinu-s-ukrainoi-news,Новая Газета - novayagazeta.ru
https://novayagazeta.ru/articles/2022/02/24/chto-proizoshlo-za-noch-24-fevralia-korotko-news,http://web.archive.org/web/20220224063831/https://novayagazeta.ru/articles/2022/02/24/chto-proizoshlo-za-noch-24-fevralia-korotko-news,Wayback Machine
https://novayagazeta.ru/articles/2022/03/02/poslednii-dovod,http://web.archive.org/web/20220302103843/https://novayagazeta.ru/articles/2022/03/02/poslednii-dovod,Новая Газета - novayagazeta.ru
https://novayagazeta.ru/articles/2022/02/26/krichat-ot-boli-vse-budut-na-odnom-iazyke-bolee-tysiachi-rossiiskikh-medikov-potrebovali-prekr
run_time unreachable
2022-02-26T15:21:11.457890 467
2022-02-26T15:51:11.557991 463
2022-02-26T16:21:11.657656 455
2022-02-26T16:51:11.700891 456
2022-02-26T17:21:11.754011 455
2022-02-26T17:51:11.797150 452
2022-02-26T18:21:11.847131 449
2022-02-26T20:47:57.940628 454
2022-02-26T21:24:24.930520 452
#!/usr/bin/env python
import os
from datetime import datetime
total_objects = 4300000
elapsed = 0
last = None
count = 0
def response(flow):
flow.response.text = flow.response.text.replace(
'Members of extremist Oath Keepers group planned attack',
'Members of extremist Oath Keepers group planned to prevent the attack'
)
{
"compilerOptions": {
"target": "es5",
"module": "commonjs",
"strict": true,
"esModuleInterop": true,
"skipLibCheck": true,
"forceConsistentCasingInFileNames": true
}
}