Skip to content

Instantly share code, notes, and snippets.

@hay
hay / WMStats.java
Last active August 30, 2016 11:03
Wikipedia statistics parser comparison
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
public class WMStats {
private static final String FILE_NAME = "pagecounts-20141029-230000";
@richardlehane
richardlehane / sf.py
Last active August 29, 2015 14:15
Archivematica FPR script for sf
#!/usr/bin/env python
import base64
import json
import urllib2
import sys
sfurl = 'http://localhost:5138/identify/'
try:
@ebenenglish
ebenenglish / word-coord-json-parse-test.md
Created March 7, 2018 21:21
compares performance of parsing word-coordinate data in different JSON source formats

Overview

For each of these JSON word-coordinate file formats of the same OCR word-coordinate data, representing a single newspaper page:

  1. Open-ONI (58.9 kB)
  2. IIIF Annotation List (826.4 kB)

I ran a test in the Rails console, benchmarking the time needed to:

  1. load the source file into memory
  2. parse as JSON
  3. find nodes matching a search term ("October")
@epoz
epoz / scan_publishers.py
Created April 23, 2020 21:26
Scan Crossref data dump of 2020-04-14
import json
import gzip
import os
from progress.bar import Bar
# Scan the Crossref data dump as mentioned in : https://twitter.com/CrossrefOrg/status/1250146935861886976
# And parse out the publishers names, so you know where in the giant dump your own data can be found
# Note this script uses the progress library, so before running do a "pip install progress"
filenames = [filename for filename in os.listdir('.') if filename.endswith('.json.gz')]
@sneakers-the-rat
sneakers-the-rat / e_hashs.json
Last active June 10, 2024 08:32
Elsevier PDF "hashes"
[
"FCi27mtaKod38ztmGndn-y8NNz.r.lt6SndqGztz_ztr-ngqQm9aMo9eOnMeJntuNntu",
"D2ei2mgqJz9b-m.mGmPqRyLNNnwmOlt7.ywiGmt-Kndr9otqRywv8o9ePmtiNmd2Sn92Tma",
"6U7vcmPuOn9uLnMaGyM7-nLNNntv9lt6RmtaGmweOyMmJnMmSmgmOo9eOnM6LnMaRmM-Tma",
"lXLf8owyQztiMzwqGnMz7zcNNotb7lwf.m9qGzt6Km.qMngqLndqLo9eOotaNm96Mmt6Tma",
"FCi27y9qOnd-Ny96GmPmOmcNNzwf-lwj-m9mGztz7ytaMnM78n9v-o9ePmM6Rm9-Qn9eTma",
"XlEDumMz7nM7-m9iGogmRmLNNyt_8lwiKz9eGm9-Pm.v7ztiLztz_o9eOnMeQnd-Sodm",
"lXLf8yt-JywmNmPeGm9n9n8NNzgn.lt_8zwqGogz7zgn7zt6SyPr-o9eOnM6Pot2Mn9qTma",
"FCi27zgf8mdqMmMeGnMmMy8NNz9eQlweNy.eGmMiMm96Qmgr9nMb-o9ePmtuRmt6JotmTma",
"FCi27nwmKnMeSodeGm.z.y8NNntz.lt-PywmGy9__ngqQmtiPmtb7o9ePmteJotyJoduTma",