Skip to content

Instantly share code, notes, and snippets.

View anjackson's full-sized avatar
🧐

Andy Jackson anjackson

🧐
View GitHub Profile
@anjackson
anjackson / humans.warc
Created August 1, 2015 07:57
Example WARC from wget
WARC/1.0
WARC-Type: warcinfo
Content-Type: application/warc-fields
WARC-Date: 2015-07-31T16:32:22Z
WARC-Record-ID: <urn:uuid:CD4DD5EA-710A-43A4-9E75-2238B9664926>
WARC-Filename: humans.warc.gz
WARC-Block-Digest: sha1:AARITJBDT4LFDLBOUU63IJAD2MK7WFL3
Content-Length: 241
software: Wget/1.16.3 (darwin14.1.0)
@anjackson
anjackson / gist:06971ff43e50645e3f2f
Last active August 29, 2015 14:23
OpenWayback Stacktrace Analysis

Most of the threads look like this - waiting for something to do:

"http-8080-198" daemon prio=10 tid=0x00007f06cc18e800 nid=0x10ca in Object.wait() [0x00007f073b6f5000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x0000000703eeafa8> (a org.apache.tomcat.util.net.JIoEndpoint$Worker)
        at java.lang.Object.wait(Object.java:503)
        at org.apache.tomcat.util.net.JIoEndpoint$Worker.await(JIoEndpoint.java:458)
        - locked <0x0000000703eeafa8> (a org.apache.tomcat.util.net.JIoEndpoint$Worker)
@anjackson
anjackson / gist:cb4a10711da9a1f9617a
Created May 5, 2015 12:33
JHOVE 'validating' a bytestream
$ jhove /Users/andy/Documents/workspace/format-corpus/3rd-party/systems-showcase-files/MVI_0943.mp4
May 5, 2015 1:30:33 PM edu.harvard.hul.ois.jhove.JhoveBase init
SEVERE: Testing SEVERE level
Jhove (Rel. 1.11, 2013-09-29)
Date: 2015-05-05 13:30:34 BST
RepresentationInformation: /Users/andy/Documents/workspace/format-corpus/3rd-party/systems-showcase-files/MVI_0943.mp4
ReportingModule: BYTESTREAM, Rel. 1.3 (2007-04-10)
LastModified: 2015-01-21 03:12:23 GMT
Size: 583677
Format: bytestream
@anjackson
anjackson / gist:0df9eae6c8af82771747
Created March 14, 2015 22:38
IA MIDI file download problems
$ curl -I "http://web.archive.org/web/19960512211734/http://merlin.legend.org.uk/~simeond/mods/startrek.mod"
HTTP/1.1 200 OK
Server: Tengine/2.0.3
Date: Sat, 14 Mar 2015 22:34:06 GMT
Content-Type: text/plain
Content-Length: 272694
Connection: keep-alive
set-cookie: wayback_server=98; Domain=archive.org; Path=/; Expires=Mon, 13-Apr-15 22:34:06 GMT;
Memento-Datetime: Sun, 12 May 1996 21:17:34 GMT
Link: <http://merlin.legend.org.uk/~simeond/mods/startrek.mod>; rel="original", <http://web.archive.org/web/timemap/link/http://merlin.legend.org.uk/~simeond/mods/startrek.mod>; rel="timemap"; type="application/link-format", <http://web.archive.org/web/http://merlin.legend.org.uk/~simeond/mods/startrek.mod>; rel="timegate", <http://web.archive.org/web/19960512211734/http://merlin.legend.org.uk/~simeond/mods/startrek.mod>; rel="first last memento"; datetime="Sun, 12 May 1996 21:17:34 GMT"
Unknown err: file_get_contents(http://es-lb:9200/archive-nonoindex-alias/_search?q=%28+collection%3Asoftwarelibrary_msdos_games+OR+mediatype%3Asoftwarelibrary_msdos_games+%29&from=0&size=75&default_operator=AND&fields=identifier%2Ctitle%2Cdescription%2Cmediatype%2Ccollection%2Clanguage%2Cvolume%2Cavg_rating%2Cnum_reviews%2Cpublicdate%2Cdate%2Cweek%2Cdownloads%2Csubject%2Ccreator&sort=downloads%3Adesc&source=%7B%22aggs%22%3A%7B%22subjectSorter%22%3A%7B%22terms%22%3A%7B%22field%22%3A%22subjectSorter%22%2C%22size%22%3A200%7D%7D%2C%22mediatype%22%3A%7B%22terms%22%3A%7B%22field%22%3A%22mediatype%22%2C%22size%22%3A200%7D%7D%2C%22languageSorter%22%3A%7B%22terms%22%3A%7B%22field%22%3A%22languageSorter%22%2C%22size%22%3A10%7D%7D%7D%7D): failed to open stream: HTTP request failed! HTTP/1.0 503 Service Unavailable [/usr/local/petabox/www/common/Search.inc:314] Unknown err: file_get_contents(http://es-lb:9200/archive-nonoindex-alias/_search?q=%28+collection%3Asoftwarelibrary_msdos_games+OR+mediatype%3Asoftwarelibrary_msd
@anjackson
anjackson / profile-sample-flat-surts.json
Created December 9, 2014 11:19
profile-sample-flat-surts.json
{
"@context": "https://oduwsdl.github.io/contexts/archiveprofile.jsonld",
"@id": "http://example.com/",
"accesspoint": "http://example.com/wayback/",
"description": "A test archive that...",
"established": "2012-03-17",
"homepage": "http://example.com/",
"language": {},
"memento_compliance": "https://oduwsdl.github.io/terms/mementosupport#native",
"name": "Test Archive",
@anjackson
anjackson / scan.md
Created October 17, 2014 09:13
Notes from scanning PRONOM signature file.

While processing this data source, 6 issues were found.

  • Could not parse MIME type 'com.adobe.photoshop-image' for entry x-fmt/92
  • Could not parse MIME type 'com.microsoft.word.doc' for entry x-fmt/64
  • Could not parse MIME type 'vnd.lotus-approach' for entry x-fmt/333
  • File extension '.jls' for entry fmt/150 does not appear to be a valid file extension.
  • File extension 'qxp report' for entry fmt/650 does not appear to be a valid file extension.
  • File extension 'qxp%20report' for entry fmt/650 does not appear to be a valid file extension.

Although, perhaps the %20 is allowed in file extensions?

@anjackson
anjackson / terminal.log
Created October 3, 2014 09:06
Comparing Perl file reading methods for hash calcuation
opf:perl andy$ time perl sha256-asfile.pl ~/Downloads/ubuntu-12.10-desktop-amd64.iso
256a2cc652ec86ff366907fd7b878e577b631cc6c6533368c615913296069d80 /Users/andy/Downloads/ubuntu-12.10-desktop-amd64.iso
real 0m8.825s
user 0m8.102s
sys 0m0.479s
opf:perl andy$ time perl sha256-slurp.pl ~/Downloads/ubuntu-12.10-desktop-amd64.iso
256a2cc652ec86ff366907fd7b878e577b631cc6c6533368c615913296069d80 /Users/andy/Downloads/ubuntu-12.10-desktop-amd64.iso
@anjackson
anjackson / UK-to-Trove-Backlinks-2009-2010.md
Last active August 29, 2015 14:06
UK-to-Trove Backlinks
@anjackson
anjackson / nohup.out
Created September 16, 2014 15:15
twarc.py 0.0.7
Traceback (most recent call last):
File "/usr/local/bin/twarc.py", line 283, in <module>
archive(args.query, tweets)
File "/usr/local/bin/twarc.py", line 197, in archive
for status in statuses:
File "/usr/local/bin/twarc.py", line 139, in stream
yield json.loads(line)
File "/usr/local/lib/python2.7/json/__init__.py", line 310, in loads
return _default_decoder.decode(s)
File "/usr/local/lib/python2.7/json/decoder.py", line 346, in decode