Skip to content

Instantly share code, notes, and snippets.

Avatar

Johan van der Knijff bitsgalore

  • Rotterdam, The Netherlands
View GitHub Profile
@bitsgalore
bitsgalore / saveToWayback.py
Last active Apr 22, 2021
Saves URLs (from either list or root URL) to internet Archive's Wayback Machine
View saveToWayback.py
#! /usr/bin/env python3
#
"""
Save web pages to Wayback Machine. Argument urlsIn can either be
a text file with URLs (each line contains one URL), or a single
URL. In the first (input file) case it will simply save each URL.
In the latter case (input URL) it will extract all links from the URL, and
save those as well as the root URL (useful for saving a page with all
of its direct references). The optional --extensions argument can be used
to limit this to one or more specific file extensions. E.g. the following
View warc-capture-script-response.py
#! /usr/bin/env python3
from warcio.capture_http import capture_http
import requests
def main():
# Existing warc.gz file (created with wget, then compressed using warcio's
# 'recompress' command)
with capture_http("ziklies.home.xs4all.nl.warc.gz"):
for indexOnder in range(1, 8):
for indexMidden in range(1, 8):
View instructies-omsipcreator.md

Instructies omSipCreator

Waarschuwing

Gebruik omSipCreator voor de tests op kopieën van batches, en NIET op de originele opslaglocaties! Dit is vooral omdat omSipCreator in 'prune' modus (opschoonfunctie) batches wijzigt en daarbij data verwijdert!!

Ik neem in de voorbeelden hieronder even aan dat Python onder de volgende folder geïnstalleerd is:

C:\Python37\
@bitsgalore
bitsgalore / imagemagick-openjpeg-build-install.md
Created Feb 26, 2020
Imagemagick + openjpeg build and install notes
View imagemagick-openjpeg-build-install.md

ImageMagick / OpenJPEG build and install notes

I always end up getting this wrong; steps below worked for Linux Mint 19.3 (based on Ubuntu 18.04). Build/installation order is important; JPEG 2000 support in ImageMagick only works if OpenJPEG is found at build time, so we have to start with that. Note that for OpenJPEG an 'openjpeg-dev' Debian package exists. As I'm not entirely sure this is the most up-to-date version, and JPEG 2000 support is important for me, I'm compiling this library from the sources here. Otherwise everything under the 'OpenJPEG' could probably be subsituted by the one-liner sudo at-get install openjpeg-dev).

Cmake

@bitsgalore
bitsgalore / geolocateDomains.py
Last active Jan 29, 2020
Geolocation of web domains
View geolocateDomains.py
#! /usr/bin/env python3
"""
Geolocate web domains. Input file is a text file where each line contains
1 web domain.
Author: Johan van der Knijff
Requirements:
1. Unix/Linux environment with 'host' tool installed,
View xkcd-jp2.xml
<?xml version='1.0' encoding='UTF-8'?>
<jpylyzer xmlns="http://openpreservation.org/ns/jpylyzer/v2/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://openpreservation.org/ns/jpylyzer/v2/ http://jpylyzer.openpreservation.org/jpylyzer-v-2-0.xsd">
<toolInfo>
<toolName>jpylyzer</toolName>
<toolVersion>2.0.0</toolVersion>
</toolInfo>
<file xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:ns0="adobe:ns:meta/" xmlns:ns2="http://ns.adobe.com/xap/1.0/" xmlns:ns4="http://ns.adobe.com/photoshop/1.0/" xmlns:ns5="http://ns.adobe.com/xap/1.0/mm/" xmlns:ns6="http://ns.adobe.com/xap/1.0/sType/ResourceEvent#" xmlns:ns7="http://ns.adobe.com/tiff/1.0/" xmlns:ns8="http://ns.adobe.com/exif/1.0/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<fileInfo>
<fileName>jpeg2000_2x.jp2</fileName>
<filePath>/home/johan/Downloads/jpeg2000_2x.jp2</filePath>
@bitsgalore
bitsgalore / checkLastModified.sh
Last active Nov 12, 2019
Report last-modified date header for a list of URLs
View checkLastModified.sh
#!/bin/bash
# Check value of last-modified date header for list of URLs
#
# Uses curl: https://curl.haxx.se/
# Display usage message if command line does not contain expected
# number of arguments
if [ "$#" -ne 2 ] ; then
echo "Usage: checkLastModified.sh fileIn fileOut" >&2
View jpylyzer-coc-example.xml
<?xml version='1.0' encoding='UTF-8'?>
<jpylyzer xmlns="http://openpreservation.org/ns/jpylyzer/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://openpreservation.org/ns/jpylyzer/ http://jpylyzer.openpreservation.org/jpylyzer-v-1-1.xsd">
<toolInfo>
<toolName>jpylyzer</toolName>
<toolVersion>1.18.0</toolVersion>
</toolInfo>
<fileInfo>
<fileName>openJPEG15.jp2</fileName>
<filePath>/home/johan/jpylyzer-test-files/openJPEG15.jp2</filePath>
<fileSizeInBytes>670372</fileSizeInBytes>
View tape-dd.md

Reading a tape with dd and mt

In the simplest case, reading data from a tape involves nothing more than a dd command line such as this one:

dd if=/dev/nst0 of=file0001.dd bs=16384

Here, the "if" argument tells dd to read input from the non-rewind block device /dev/nst0, and the value of "of" defines the file where

View jpylyzer2-outformat.md

Proposed jpylyzer output format changes

Johan van der Knijff, 3 July 2019

This document describes some proposed changes to the jpylyzer output format for the upcoming jpylyzer 2.0 release (which is foreseen for November 2019). The main occasion for these changes is the addition of raw codestream validation functionality. Since this functionality will lead to a small (but nevertheless breaking) change to jpylyzer's output format, this is a good moment for fixing a few other inconsistencies.

Related Github issues are: