Skip to content

Instantly share code, notes, and snippets.


Andy Jackson anjackson

View GitHub Profile
itemir /
Last active Jan 16, 2021
Python script to calculate MD5 hash of a multipart uploaded file (relevant for Object Storages like OCI Object Storage or AWS S3)
import argparse
import hashlib
import sys
def md5(f, count):
hash_md5 = hashlib.md5()
eof = False
for i in range(count * 16):
chunk =
ato /
Last active Sep 29, 2016
tinycdxserver example

I just tried my example from the tinycdxserver README and realised that curl is messing up the line-endings due to some conversion it does by default. I haven't checked yet exactly what curl is doing but tinycdxserver is interpreting it as if all the lines in the file have been concatenated together (you can see that by running tinycdxserver in verbose mode with the -v option).

Using curl's --data-binary option instead of --data fixes that and I've updated the README correspondingly.

That could be what's tripping you up. Here's a more complete example that I just tested. You should get an "Added N records" response back if it worked properly, where N is the line count of the cdx.

View gist:7069028
~ virtualenv env
~ source env/bin/activate
~ pip install git+
~ pip install pyOpenSSL
~ git clone git clone
~ cd warcprox
~ python --rollover-idle-time=7200
2013-10-20 14:36:07,923 66818 MainThread INFO server_activate( listening on
2013-10-20 14:36:07,924 66818 MainThread INFO _read_ca( read CA key+cert from ./warcprox-ca.pem
2013-10-20 14:36:07,928 66818 WarcWriterThread INFO run( WarcWriterThread starting, directory=/private/tmp/warcprox/warcs gzip=False rollover_size=1000000000 rollover_idle_time=7200 prefix=WARCPROX port=8080
Asparagirl / gist:6206247
Last active May 26, 2021
Have a WARC that you would like to upload to the Internet Archive so that it can eventually be included in their Wayback Machine? Here's how to upload it from the command line.
View gist:6206247

Do you have a WARC file of a website all downloaded and ready to be added to the Internet Archive? Great! You can do that with the Internet Archive's web-based uploader, but it's not ideal and it can't handle really big uploads. Here's how you can upload your WARC files to the IA from the command line, and without worrying about a size restriction.

First, you need to get your Access Key and Secret Key from the Internet Archive for the S3-like API. Here's where you can get that for your IA account: Don't share those with other people!

Here's their documentation file about how to use it, if you need some extra help:

Next, you should copy the following files to a text file and edit them as needed:

anarchivist / diskdefs
Created Jun 22, 2012
Revised diskdefs for cpmtools
View diskdefs
diskdef ibm-3740
seclen 128
tracks 77
sectrk 26
blocksize 1024
maxdir 64
skew 6
boottrk 2
os p2dos