Skip to content

Instantly share code, notes, and snippets.

<?xml version="1.0" encoding="UTF-8"?>
<jhove xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://hul.harvard.edu/ois/xml/ns/jhove" xsi:schemaLocation="http://hul.harvard.edu/ois/xml/ns/jhove http://hul.harvard.edu/ois/xml/xsd/jhove/1.6/jhove.xsd" name="Jhove" release="1.11" date="2013-09-29">
<date>2014-05-08T08:08:45-04:00</date>
<repInfo uri="/Users/justin/Downloads/seq-1.jp2">
<reportingModule release="1.3" date="2007-01-08">JPEG2000-hul</reportingModule>
<lastModified>2014-05-08T07:59:31-04:00</lastModified>
<size>3720460</size>
<format>JPEG 2000</format>
<status>Well-Formed and valid</status>
<sigMatch>
@justinlittman
justinlittman / gist:f292119b19b528e212f0
Created November 5, 2014 20:24
Kluge for capturing HTTP transaction information
def wrap_execute(exec_func, debuggable):
#List of responses. Due to redirects, there may be multiple responses.
resps = []
#TODO: Check if there is an etag
#When debuglevel is set httplib outputs details to stdout.
#This captures stdout.
capture_out = StringIO.StringIO()
sys.stdout = capture_out
#sys.stdout = Tee([capture_out, sys.__stdout__])
@justinlittman
justinlittman / tweet_ids.txt
Created November 14, 2014 13:16
Sample tweet ids
This file has been truncated, but you can view the full file.
531587808637775872
531683980865966080
531687621945462784
531689196747227136
531690425229516800
531691539991625728
531692957930639361
531694030628081664
531695487184044032
531696705981648897
@justinlittman
justinlittman / technique1.warc
Last active October 14, 2015 19:08
Recording Flickr API calls to WARC using httplib debugging
WARC/1.0
WARC-Type: request
Content-Length: 528
WARC-Date: 2015-10-14T18:50:56Z
WARC-Payload-Digest: sha1:2af4ebafc68cda47bc56df9047c5d46457690d54
WARC-Target-URI: https://api.flickr.com/services/rest/?photo_id=16610484049&secret=ee80d9ecdc&nojsoncallback=1&method=flickr.photos.getInfo&format=json
Content-Type: application/http; msgtype=request
WARC-Record-ID: <urn:uuid:80ae5e4c-72a4-11e5-a45f-2cf0ee020fec>
POST /services/rest/?photo_id=16610484049&secret=ee80d9ecdc&nojsoncallback=1&method=flickr.photos.getInfo&format=json HTTP/1.1
@justinlittman
justinlittman / output.txt
Created October 14, 2015 19:31
Recording Flickr API calls to WARC using warcprox
2015-10-14 15:24:45,556 90037 INFO MainThread warcprox.dedup.DedupDb.__init__(dedup.py:25) creating new deduplication database ./warcprox-dedup.db
2015-10-14 15:24:45,563 90037 INFO MainThread warcprox.warcprox.WarcProxy.server_activate(warcprox.py:265) WarcProxy listening on 127.0.0.1:8000
2015-10-14 15:24:45,564 90037 INFO MainThread warcprox.warcwriter.WarcWriter.__init__(warcwriter.py:50) warc destination directory ./warcs doesn't exist, creating it
2015-10-14 15:24:45,564 90037 INFO MainThread warcprox.controller.WarcproxController.run_until_shutdown(controller.py:58) SIGTERM will initiate graceful shutdown
2015-10-14 15:24:45,565 90037 INFO WarcWriterThread warcprox.warcwriter.WarcWriterThread.run(warcwriter.py:273) WarcWriterThread starting, directory=/Users/justinlittman/Data/sfm3/blog_examples/warcs gzip=False rollover_size=1000000000 rollover_idle_time=None prefix=WARCPROX port=8000
2015-10-14 15:24:50,601 90037 INFO Thread-1 warcprox.warcprox.WarcProxyHandler.log_message(mitmproxy.py:140) WarcProxy
@justinlittman
justinlittman / weibo.json
Created March 18, 2016 13:30
A post from Sina Weibo
{
"reposts_count": 543,
"biz_feature": 0,
"truncated": false,
"text": "\u771f\u597d //@\u7cbd\u7cbd\u7cbd\u7cbd\u7cbd\u7cbd\u7cbd:\u7fa1\u6155 //@\u5c38\u59d0:\u771f\u597d\u554a\uff0c\u591a\u5c11\u4eba\u66fe\u548c\u597d\u59d0\u59b9\u8bb8\u8fc7\u8fd9\u6837\u7684\u613f\u671b\uff0c\u6709\u51e0\u4e2a\u771f\u6b63\u5b9e\u73b0\u4e86\u5462 //@\u9ec4\u660f\u5c11\u5973\u82b1\u8a00\u521d:\u7fa1\u6155 //@\u5c81\u534e\u897f\u98ce:\u7fa1\u6155",
"pid": 3952964537875184,
"visible": {
"type": 0,
"list_id": 0
},
{
"contributors": null,
"truncated": false,
"text": "We have got to tell corporate America that if they want us to buy their products, they damn well better manufacture them in America.",
"is_quote_status": false,
"in_reply_to_status_id": null,
"id": 733689250588168192,
"favorite_count": 3003,
"source": "<a href=\"https://about.twitter.com/products/tweetdeck\" rel=\"nofollow\">TweetDeck</a>",
"retweeted": false,
@justinlittman
justinlittman / WARC request record
Created May 21, 2016 00:03
WARC record header and HTTP message from a WARC request record.
WARC/1.0
WARC-Type: request
WARC-Record-ID: <urn:uuid:d150a270-041b-49d5-845a-58f0f334ea80>
WARC-Date: 2016-05-20T19:28:18Z
WARC-Target-URI: https://api.twitter.com/1.1/statuses/user_timeline.json?count=200&max_id=682336123457155073&user_id=216776631
WARC-Concurrent-To: <urn:uuid:6a8b0aaa-d40c-47ac-a4d0-35cddb7cbe83>
WARC-Block-Digest: sha1:05a8973dda59c50b8c0c1343d2286443a1352387
Content-Type: application/http;msgtype=request
Content-Length: 575
@justinlittman
justinlittman / WARC response record
Last active June 13, 2016 15:50
WARC record header and HTTP message header from a WARC response record.
WARC/1.0
WARC-Type: response
WARC-Record-ID: <urn:uuid:6a8b0aaa-d40c-47ac-a4d0-35cddb7cbe83>
WARC-Date: 2016-05-20T19:28:18Z
WARC-Target-URI: https://api.twitter.com/1.1/statuses/user_timeline.json?count=200&max_id=682336123457155073&user_id=216776631
WARC-IP-Address: 199.16.156.199
Content-Type: application/http;msgtype=response
Content-Length: 12584
WARC-Block-Digest: sha1:d4f5ddcfbe1c814fdee445ff145abebf22411bf8
WARC-Payload-Digest: sha1:c952c2176ccf15f7ecb604be6b58390491bbfe40
@justinlittman
justinlittman / gist:f96230ab58588b6d164f
Last active February 16, 2017 18:56
Sample code for using SPARQL update to load VIVO.
from __future__ import division
from SPARQLWrapper import SPARQLWrapper
import socket
import codecs
import os
import time
from namespace import ns_manager
from rdflib import Graph
import math