Skip to content

Instantly share code, notes, and snippets.

View anjackson's full-sized avatar
🧐

Andy Jackson anjackson

🧐
View GitHub Profile
@anarchivist
anarchivist / diskdefs
Created June 22, 2012 19:18
Revised diskdefs for cpmtools
diskdef ibm-3740
seclen 128
tracks 77
sectrk 26
blocksize 1024
maxdir 64
skew 6
boottrk 2
os p2dos
end
@Asparagirl
Asparagirl / gist:6206247
Last active February 14, 2024 19:56
Have a WARC that you would like to upload to the Internet Archive so that it can eventually be included in their Wayback Machine? Here's how to upload it from the command line.

Do you have a WARC file of a website all downloaded and ready to be added to the Internet Archive? Great! You can do that with the Internet Archive's web-based uploader, but it's not ideal and it can't handle really big uploads. Here's how you can upload your WARC files to the IA from the command line, and without worrying about a size restriction.

First, you need to get your Access Key and Secret Key from the Internet Archive for the S3-like API. Here's where you can get that for your IA account: http://archive.org/account/s3.php Don't share those with other people!

Here's their documentation file about how to use it, if you need some extra help: http://archive.org/help/abouts3.txt

Next, you should copy the following files to a text file and edit them as needed:

export IA_S3_ACCESS_KEY="YOUR-ACCESS-KEY-FROM-THE-IA-GOES-HERE"
@atomotic
atomotic / gist:7069028
Created October 20, 2013 12:39
warcprox
~ virtualenv env
~ source env/bin/activate
~ pip install git+https://github.com/nlevitt/warctools@tweaks
~ pip install pyOpenSSL
~ git clone git clone https://github.com/nlevitt/warcprox
~ cd warcprox
~ python warcprox.py --rollover-idle-time=7200
2013-10-20 14:36:07,923 66818 MainThread INFO server_activate(warcprox.py:346) listening on 127.0.0.1:8080
2013-10-20 14:36:07,924 66818 MainThread INFO _read_ca(warcprox.py:75) read CA key+cert from ./warcprox-ca.pem
2013-10-20 14:36:07,928 66818 WarcWriterThread INFO run(warcprox.py:510) WarcWriterThread starting, directory=/private/tmp/warcprox/warcs gzip=False rollover_size=1000000000 rollover_idle_time=7200 prefix=WARCPROX port=8080
@ato
ato / README.md
Last active September 29, 2016 20:24
tinycdxserver example

I just tried my example from the tinycdxserver README and realised that curl is messing up the line-endings due to some conversion it does by default. I haven't checked yet exactly what curl is doing but tinycdxserver is interpreting it as if all the lines in the file have been concatenated together (you can see that by running tinycdxserver in verbose mode with the -v option).

Using curl's --data-binary option instead of --data fixes that and I've updated the README correspondingly.

That could be what's tripping you up. Here's a more complete example that I just tested. You should get an "Added N records" response back if it worked properly, where N is the line count of the cdx.

@itemir
itemir / md5_multipart_upload.py
Last active January 18, 2024 15:31
Python script to calculate MD5 hash of a multipart uploaded file (relevant for Object Storages like OCI Object Storage or AWS S3)
#!/usr/bin/python
import argparse
import hashlib
import sys
def md5(f, count):
hash_md5 = hashlib.md5()
eof = False
for i in range(count * 16):
chunk = f.read(65536)