Skip to content

Instantly share code, notes, and snippets.

View anjackson's full-sized avatar
🧐

Andy Jackson anjackson

🧐
View GitHub Profile
@anjackson
anjackson / profile-sample-flat-surts.json
Created December 9, 2014 11:19
profile-sample-flat-surts.json
{
"@context": "https://oduwsdl.github.io/contexts/archiveprofile.jsonld",
"@id": "http://example.com/",
"accesspoint": "http://example.com/wayback/",
"description": "A test archive that...",
"established": "2012-03-17",
"homepage": "http://example.com/",
"language": {},
"memento_compliance": "https://oduwsdl.github.io/terms/mementosupport#native",
"name": "Test Archive",
Unknown err: file_get_contents(http://es-lb:9200/archive-nonoindex-alias/_search?q=%28+collection%3Asoftwarelibrary_msdos_games+OR+mediatype%3Asoftwarelibrary_msdos_games+%29&from=0&size=75&default_operator=AND&fields=identifier%2Ctitle%2Cdescription%2Cmediatype%2Ccollection%2Clanguage%2Cvolume%2Cavg_rating%2Cnum_reviews%2Cpublicdate%2Cdate%2Cweek%2Cdownloads%2Csubject%2Ccreator&sort=downloads%3Adesc&source=%7B%22aggs%22%3A%7B%22subjectSorter%22%3A%7B%22terms%22%3A%7B%22field%22%3A%22subjectSorter%22%2C%22size%22%3A200%7D%7D%2C%22mediatype%22%3A%7B%22terms%22%3A%7B%22field%22%3A%22mediatype%22%2C%22size%22%3A200%7D%7D%2C%22languageSorter%22%3A%7B%22terms%22%3A%7B%22field%22%3A%22languageSorter%22%2C%22size%22%3A10%7D%7D%7D%7D): failed to open stream: HTTP request failed! HTTP/1.0 503 Service Unavailable [/usr/local/petabox/www/common/Search.inc:314] Unknown err: file_get_contents(http://es-lb:9200/archive-nonoindex-alias/_search?q=%28+collection%3Asoftwarelibrary_msdos_games+OR+mediatype%3Asoftwarelibrary_msd
@anjackson
anjackson / gist:0df9eae6c8af82771747
Created March 14, 2015 22:38
IA MIDI file download problems
$ curl -I "http://web.archive.org/web/19960512211734/http://merlin.legend.org.uk/~simeond/mods/startrek.mod"
HTTP/1.1 200 OK
Server: Tengine/2.0.3
Date: Sat, 14 Mar 2015 22:34:06 GMT
Content-Type: text/plain
Content-Length: 272694
Connection: keep-alive
set-cookie: wayback_server=98; Domain=archive.org; Path=/; Expires=Mon, 13-Apr-15 22:34:06 GMT;
Memento-Datetime: Sun, 12 May 1996 21:17:34 GMT
Link: <http://merlin.legend.org.uk/~simeond/mods/startrek.mod>; rel="original", <http://web.archive.org/web/timemap/link/http://merlin.legend.org.uk/~simeond/mods/startrek.mod>; rel="timemap"; type="application/link-format", <http://web.archive.org/web/http://merlin.legend.org.uk/~simeond/mods/startrek.mod>; rel="timegate", <http://web.archive.org/web/19960512211734/http://merlin.legend.org.uk/~simeond/mods/startrek.mod>; rel="first last memento"; datetime="Sun, 12 May 1996 21:17:34 GMT"
@anjackson
anjackson / gist:cb4a10711da9a1f9617a
Created May 5, 2015 12:33
JHOVE 'validating' a bytestream
$ jhove /Users/andy/Documents/workspace/format-corpus/3rd-party/systems-showcase-files/MVI_0943.mp4
May 5, 2015 1:30:33 PM edu.harvard.hul.ois.jhove.JhoveBase init
SEVERE: Testing SEVERE level
Jhove (Rel. 1.11, 2013-09-29)
Date: 2015-05-05 13:30:34 BST
RepresentationInformation: /Users/andy/Documents/workspace/format-corpus/3rd-party/systems-showcase-files/MVI_0943.mp4
ReportingModule: BYTESTREAM, Rel. 1.3 (2007-04-10)
LastModified: 2015-01-21 03:12:23 GMT
Size: 583677
Format: bytestream
@anjackson
anjackson / gist:06971ff43e50645e3f2f
Last active August 29, 2015 14:23
OpenWayback Stacktrace Analysis

Most of the threads look like this - waiting for something to do:

"http-8080-198" daemon prio=10 tid=0x00007f06cc18e800 nid=0x10ca in Object.wait() [0x00007f073b6f5000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x0000000703eeafa8> (a org.apache.tomcat.util.net.JIoEndpoint$Worker)
        at java.lang.Object.wait(Object.java:503)
        at org.apache.tomcat.util.net.JIoEndpoint$Worker.await(JIoEndpoint.java:458)
        - locked <0x0000000703eeafa8> (a org.apache.tomcat.util.net.JIoEndpoint$Worker)
@anjackson
anjackson / humans.warc
Created August 1, 2015 07:57
Example WARC from wget
WARC/1.0
WARC-Type: warcinfo
Content-Type: application/warc-fields
WARC-Date: 2015-07-31T16:32:22Z
WARC-Record-ID: <urn:uuid:CD4DD5EA-710A-43A4-9E75-2238B9664926>
WARC-Filename: humans.warc.gz
WARC-Block-Digest: sha1:AARITJBDT4LFDLBOUU63IJAD2MK7WFL3
Content-Length: 241
software: Wget/1.16.3 (darwin14.1.0)
@anjackson
anjackson / results.csv
Created September 2, 2015 11:23
Files with WSD extension in the 1996-2013 collection
We can make this file beautiful and searchable if this error is corrected: It looks like row 10 should actually have 7 columns, instead of 5. in line 9.
wayback_date,url,resourcename,content_length,content_ffb,content_type,crawl_year
20040828045513,http://www.tei-c.org.uk:80/WSDs/iso646ss.wsd,iso646ss.wsd,12028,3c21444f,text/plain,2004
20020327071047,http://www.hcu.ox.ac.uk:80/TEI/WSDs/teien.wsd,teien.wsd,1411,3c21444f,text/plain,2002
20040830232050,http://www.tei-c.org.uk:80/WSDs/iso8859a.wsd,iso8859a.wsd,29554,3c777269,text/plain,2004
20040830232046,http://www.tei-c.org.uk:80/WSDs/iso88599.wsd,iso88599.wsd,28787,3c777269,text/plain,2004
20030722032619,http://www.tei-c.org.uk:80/WSDs/teigk2.wsd,teigk2.wsd,49037,3c777269,text/plain,2003
20030722032319,http://www.tei-c.org.uk:80/WSDs/iso8859a.wsd,iso8859a.wsd,29554,3c777269,text/plain,2003
20040830232014,http://www.tei-c.org.uk:80/WSDs/iso88592.wsd,iso88592.wsd,29456,52657475,message/rfc822,2004
20040830232042,http://www.tei-c.org.uk:80/WSDs/iso88597.wsd,iso88597.wsd,27467,3c777269,text/plain,2004
20000110235101,http://src.doc.ic.ac.uk:80/Mirrors/ftp.ifi.uio.no/pub/SGML/TEI/ISO646IR.WSD,ISO646IR.WSD,13512,5265
@anjackson
anjackson / breakdown.md
Last active November 9, 2015 00:27
Ideal WARC ID result?

When analysing example.warc.gz containing a HTML response that was GZip encoded.

  • application/warc
    • application/gzip
      (outer gzip chunk)
      • application/warc; version="1.0", type=response
        (The whole WARC Record)
        • application/http; msgtype=response
          (WARC Record content, i.e. HTTP headers and entity body)
          • application/gzip
            (i.e. the entity body is compressed)
            • text/html; version=5
@anjackson
anjackson / PDF Encryption test for .net
Created March 25, 2013 13:25
This shows the logic needed to test for two the different PDF encryption modes. It is written in C#, and has been tested with version 1.2.1 of PDFBox, cross-compiled to .net using IKVM (http://www.ikvm.net/).
Console.WriteLine("Document encrypted = {0}", document.isEncrypted().ToString());
// Can it be opened?
bool isOpenable = true;
if (document.isEncrypted())
{
DecryptionMaterial decryptionMaterial = new StandardDecryptionMaterial("");
try
{
document.openProtection(decryptionMaterial);
AccessPermission ap = document.getCurrentAccessPermission();
09:14:40.487 GET http://jsmess.textfiles.com/messloader.html [HTTP/1.1 304 Not Modified 190ms]
09:14:40.684 Error in parsing value for 'image-rendering'. Declaration dropped. messloader.html:10
09:14:40.684 Error in parsing value for 'image-rendering'. Declaration dropped. messloader.html:11
09:14:40.684 Error in parsing value for 'image-rendering'. Declaration dropped. messloader.html:12
09:14:40.684 Unknown pseudo-class or pseudo-element '-webkit-full-screen'. Ruleset ignored due to bad selector. messloader.html:20
09:14:41.048 GET http://jsmess.textfiles.com/html/a2600.html [HTTP/1.1 404 Not Found 93ms]
09:14:41.049 GET http://jsmess.textfiles.com/gamehtml/Jrpacman.bin.html [HTTP/1.1 404 Not Found 185ms]
09:14:40.845 NS_ERROR_NOT_AVAILABLE: jsmess-loader.js:41
09:14:41.189 SyntaxError: illegal character messloader.html