thbar/scratchpad.md

## scratchpad.md

      
    Raw
  

              scratchpad.md
            
          
    Experimental attempt at getting organized ...
12/12/2020

Dead Simple Python

https://dev.to/codemouse92/introducing-dead-simple-python-563o
28/11/2020

Microkorg editor troubleshooting

If loading sounds from .prg files gives unexpected results: check that Midi channel is set to 1 before launching the editor! Reportedly MIDI clock needs to be set to external as well.
14/11/2020

A Visual Guide to Regular Expression


In this post, I will illustrate the various concepts underlying regex. The goal is to help you build a good mental model of how a regex pattern works.

https://amitness.com/regex/
11/11/2020

List of applications - ArchWiki


[A] general list of applications sorted by category, as a reference for those looking for packages. Many sections are split between console and graphical applications.

https://wiki.archlinux.org/index.php/List_of_applications
03/11/2020

How to move /var/www/html folder to external hdd?

https://superuser.com/questions/1101851/how-to-move-var-www-html-folder-to-external-hdd/1101856
Also:
https://askubuntu.com/questions/1220778/how-can-web-server-access-external-hdd
29/10/2020

Thorium Reader


Thorium Reader is an easy to use EPUB reading application for Windows 10/10S, MacOS and Linux.

https://github.com/edrlab/thorium-reader/releases
21/10/2020

Apache: redirect all folderv root references to home.htm file in folder

This seems to work:
RedirectMatch ^(.*)/$ $1/home.htm

16/10/2020

MIDI not working under Jack / Reaper

In View menu, open routing matrix and click on system:midi midi playback2 (needs to be enabled first from Preferences). Routing is set for each track.
15/10/2020

Virtualbox fails after kernel update

https://askubuntu.com/questions/819939/virtualbox-fails-after-kernel-update
06/10/2020

The Quartz guide to bad data


An exhaustive reference to problems seen in real-world data along with suggestions on how to resolve them.

https://github.com/Quartz/bad-data-guide
14/09/2020

What's so hard about PDF text extraction?

https://filingdb.com/b/pdf-text-extraction
11/09/2020

More than 100 scientific journals have disappeared from the Internet

https://www.nature.com/articles/d41586-020-02610-z
07/09/2020

ftfy: fixes text for you


ftfy fixes Unicode that's broken in various ways.

https://github.com/LuminosoInsight/python-ftfy
03/09/2020

QGIS Flatpak instructions

https://www.qgis.org/en/site/forusers/alldownloads.html#flatpak
01/09/2020

Keep Remote SSH Sessions and Processes Running After Disconnection

https://www.tecmint.com/keep-remote-ssh-sessions-running-after-disconnection/
Steps:
screen
Then issue commands. Then press Ctrl-a followed by d to detach. Log out.
31/08/2020

Linux - display details on startup/boot

systemd-analyze time
Result (in this case there's some odd firmware delay):
Startup finished in 1min 55.160s (firmware) + 10.965s (loader) + 3.955s (kernel) + 10.002s (userspace) = 2min 20.085s
graphical.target reached after 9.996s in userspace

Detailed breakdown:
systemd-analyze blame
Result:
          7.416s NetworkManager-wait-online.service
          1.966s vboxdrv.service
           827ms apt-daily-upgrade.service
           558ms systemd-fsck@dev-disk-by\x2duuid-9224\x2d4AC1.service
           500ms dev-sdb1.device
           477ms systemd-journal-flush.service
           ::   ::
           

Split large text file into smaller files

Here, split into 500,000-line files:
split -l 500000 -d 2019-05-21_all_domains_NL.txt domains-nl
30/08/2020

How to delete all your files

https://www.reddit.com/r/linux/comments/if1krd/how_to_delete_all_your_files/
26/08/2020

PDF info/validation/testing commands

qpdf

qpdf --check --verbose whatever.pdf
Poppler

pdfinfo whatever.pdf
Or (forces reading of all text):
pdftotext whatever.pdf
JHOVE

jhove -m PDF-hul -i whatever.pdf
Ghostscript

gs -dNOPAUSE -dBATCH -sDEVICE=nullpage whatever.pdf
Apache PDFBox

Using PDFDebugger (activates GUI-type browser):
java -jar ~/pdfbox/pdfbox-app-2.0.21.jar PDFDebugger whatever.pdf
MuPDF

mutool info whatever.pdf 
VeraPDF

verapdf whatever.pdf
(Or use GUI).
pdfcpu

pdfcpu validate whatever.pdf
Note to self: installed this by copying the Linux binary to ~/.local/bin/ (doesn't require GoLang).
Compare two PDFs

ComparePDF

Compare text (verbose output):
comparepdf ct -v=2 whatever.pdf wherever.pdf
Compare appearance (verbose output):
comparepdf ca -v=2 whatever.pdf wherever.pdf
12/08/2020

Reaper reports JACK: error creating client error omn startup

First run jackd:
jackd -dalsa -dhw:USB -r48000 -p128 -n3 -Xseq

See also here
02/08/2020

Convert stereo audio file to mono, changing bit depth and sampling frequency

To 8-bit, 15Khz:
sox versatility.wav -b 8 -r 15k versatility_8.wav remix -

BUT sox output is really noisy; better results with ffmpeg:
ffmpeg -i boc-arpeggio.wav -ar 15000 -acodec pcm_u8 boc-arpeggio-8ff.wav

27/07/2020

How to hide a list in HTML without javascript

https://stackoverflow.com/a/13127738/1209004
14/07/2020

Create shared folder on local network (Linux Mint, Caja file manager)

From instructions here:
Install samba and caja-share

sudo apt install samba
sudo apt install caja-share

Set up usershares folder and make sambashare group owner

sudo mkdir /var/lib/samba/usershares
sudo chgrp sambashare /var/lib/samba/usershares
sudo chmod 1770 /var/lib/samba/usershares

Set samba password

sudo smbpasswd -a your_username

Then reboot machine, and right-click folder in Caja and select sharing options. After this, folder is accessible from other machines on the local network.
06/07/2020

Python - CGI Programming

https://www.tutorialspoint.com/python/python_cgi_programming.htm
30/06/2020

U.S. National Archives and Records Administration Digital Preservation Framework

150 formats added in latest release:
https://github.com/usnationalarchives/digital-preservation
Convert Kazam output to HTML5-compatible MP4

ffmpeg -i mirror.mp4 -vcodec libx264 -pix_fmt yuv420p -profile:v baseline -level 3 -strict -2 mirror-264.mp4

(Source)
24/06/2020

HTML video elements in local Jekyll site not working in Chrome

https://stackoverflow.com/questions/48876911/embedded-local-mp4-not-playing-in-chrome-when-running-jekyll-serve-econnreset
Apparently works when deployed live:
https://exoji2e.github.io/2019/02/18/video-tag-in-chrome.html
18/06/2020

Enable CGI Scripts on Apache

https://www.ionos.com/community/server-cloud-infrastructure/apache/enable-cgi-scripts-on-apache/
But this assumes 1 fixed dir for cgi scripts.
Apache Tutorial: Dynamic Content with CGI

https://httpd.apache.org/docs/2.4/howto/cgi.html
This explains how to set custom script locations.
17/06/2020

File naming conventions based on Semantic tagging

https://karl-voit.at/managing-digital-photographs/
Tools here:
https://github.com/novoid
19/05/2020

Prospect Mail


The Outlook desktop client for the new Outlook Interface from MS Office 365.

https://github.com/julian-alarcon/prospect-mail
18/05/2020

Test images Developer's Image Library

https://sourceforge.net/p/openil/svn/1554/tree/trunk/Test%20Images/
JPEG 2000 (Bandcamp)

https://jpeg2000.bandcamp.com
Bitrot tool


Detects bit rotten files on the hard drive to save your precious photo and music collection from slow decay.

https://github.com/ambv/bitrot
16/05/2020

Python for AV

https://lis655.github.io/av-python-carpentry/
13/05/2020

JPEG White Paper:JPEG XL image coding system

http://ds.jpeg.org/whitepapers/jpeg-xl-whitepaper.pdf
Setting up Python-based web server

Just run:
python3 -m http.server

Then site can be accessed from:
http://127.0.0.1:8000/
Useful for testing with local files, not suitable for production. More info:
https://developer.mozilla.org/en-US/docs/Learn/Common_questions/set_up_a_local_testing_server
08/05/2020

Two Bit Bash Script Library

https://twobitpreservation.com/script-library
07/05/2020

Download videos from YouTube (and more sites)

https://ytdl-org.github.io/youtube-dl/index.html
03/05/2020

We read the privacy policies of Skype, Meet, and Webex: 10 ways videoconferencing systems can better protect privacy for customers

https://medium.com/cr-digital-lab/skype-meet-webex-videoconference-privacy-845bc8360fd3
02/05/2020

Digital Repair Cafe (Project CEST)

Lijkt qua doelen en scope erg op NDE project fysieke dragers:
https://automatic-ingest-digital-archives.github.io/Digital-Repair-Cafe/
Kijk bv ook hiernaar, "Handleiding Verouderde Dragers Herkennen":
https://www.projectcest.be/wiki/Publicatie:Handleiding_Verouderde_Dragers_Herkennen
How to Read a Floppy Disk on a Modern PC or Mac

https://www.howtogeek.com/669331/how-to-read-a-floppy-disk-on-a-modern-pc-or-mac/
30/04/2020

Reduce PDF file size

Using Ghostscript:
https://askubuntu.com/a/256449/1052776
23/04/2020

Choosing the right video conferencing tool for the job

https://freedom.press/training/blog/videoconferencing-tools/
COVID-19 and Cybersecurity

https://medium.com/@gdbelvin/covid-19-and-cybersecurity-e9ee5cba6de7
SPARQL queries YUL digital preservation

https://www.wikidata.org/wiki/User:YULdigitalpreservation/SPARQL2#Disk_image_file_formats
17/04/2020

Preservica adds headers/footers to exported HTML files

wellcomecollection/platform#4425
16/04/2020

The Robustness of Apache Tika

https://cwiki.apache.org/confluence/display/TIKA/The+Robustness+of+Apache+Tika
09/04/2020

How to use Jitsi Meet, an open source Zoom alternative

https://mashable.com/article/how-to-use-jitsi-meet-zoom-alternative/
05/04/2020

Malware Analysis Fundamentals - Files | Tools

https://winitor.com/pdf/Malware-Analysis-Fundamentals-Files-Tools.pdf
02/04/2020

The best alternatives to Zoom for videoconferencing

https://www.theverge.com/2020/4/1/21202945/zoom-alternative-conference-video-free-app-skype-slack-hangouts-jitsi
01/04/2020

Github Wikis

https://help.github.com/en/github/building-a-strong-community/about-wikis
And:
https://help.github.com/en/github/building-a-strong-community/adding-or-editing-wiki-pages
Simple-Jekyll-Search

A JavaScript library to add search functionality to any Jekyll blog:
https://github.com/christian-fei/Simple-Jekyll-Search
27/03/2020

Jitsi installation instructions

https://jitsi.org/downloads/ubuntu-debian-installations-instructions/
Jitsi servers NL

https://vc4all.nl/
26/03/2020

Books.Files: Preservation of Digital Assets in the Contemporary Publishing Industry

https://drum.lib.umd.edu/handle/1903/25605
22/03/2020

Digital preservation policies and strategies (Caylin Smith)

https://docs.google.com/spreadsheets/d/1nAPh6M5c2VlvuFtdMIDEfxwdLvQ-47-i0ZicUUGkzjM/edit#gid=0
21/03/2020

Disable / enable webcam from terminal

Disable until reboot:
sudo modprobe -r uvcvideo

Enable again:
sudo modprobe uvcvideo

Source
18/03/2020

Create large test file with only null bytes

For a 1 MB file:
dd if=/dev/zero of=file.dat count=1024 bs=1024

Same, 1 GB file:
dd if=/dev/zero of=file.dat count=1024 bs=1048576

Source
17/03/2020

Wasmachine geeft overdosering aan

https://www.wasmachines.nl/forum/457-miele-w2203-lampje-overdosering/
https://community.consumentenbond.nl/woning-huishouden-8/miele-wasmachine-trommelkruis-designed-to-fail-16834
Maar:
https://www.klusidee.nl/Forum/miele-w-3821-wasmachine-meldt-contr-dosering-t46008.html
Dus: was op 95 graden, anders speciaal reinigingsmiddel.
16/03/2020

WordToEPUB

https://daisy.org/activities/software/wordtoepub/
Announcement:
https://daisy.org/news-events/articles/new-epub-creation-tool/
12/03/2020

OneDrive – Some files weren’t downloaded

https://web.archive.org/web/20190704152920/http://yannickborghmans.com/2018/05/19/onedrive-some-files-werent-downloaded/
11/03/2020

Download files and folders from OneDrive or SharePoint

https://support.office.com/en-us/article/download-files-and-folders-from-onedrive-or-sharepoint-5c7397b7-19c7-4893-84fe-d02e8fa5df05

Downloads are subject to the following limits: individual file size limit: 10GB; total zip file size limit: 20GB; total number of files limit: 10,000.

10/03/2020

Unzipping 6 GB OneDrive ZIP file under Linux fails

Reworked this into a blog:
https://www.bitsgalore.org/2020/03/11/does-microsoft-onedrive-export-large-ZIP-files-that-are-corrupt
04/03/2020

Map Windows Folder to a Drive Letter for Quick and Easy Access

https://www.raymond.cc/blog/map-folder-or-directory-to-drive-letter-for-quick-and-easy-access/
03/03/2020

What's so hard about PDF text extraction?

https://www.filingdb.com/pdf-text-extraction
02/03/2020

Graphviz


Graphviz is open source graph visualization software. Graph visualization is a way of representing structural information as diagrams of abstract graphs and networks.

https://www.graphviz.org/
01/03/2020

Bot Sentinel


Bot Sentinel is a free platform developed to detect and track trollbots and untrustworthy Twitter accounts.

https://botsentinel.com/
The Importance of Digital Persistence

https://philarcher.org/diary/2020/importanceOfPersistence/
25/02/2020

How to Sync Microsoft OneDrive with Linux

https://www.maketecheasier.com/sync-onedrive-linux/
21/02/2020

COinS for Your Jekyll Blog

https://matthewlincoln.net/2014/03/15/coins-for-your-jekyll-blog.html
17/02/2020

Persistent identifiers for heritage objects

https://journal.code4lib.org/articles/14978
15/02/2020

Google Webfonts Helper

https://google-webfonts-helper.herokuapp.com/fonts
14/02/2020

Notes on the Troubleshooting and Repair of Compact Disc Players and CDROM Drives

https://www.repairfaq.org/sam/cdfaq.htm
Check items under "Intermittent or erratic operation" and "Operation is poor or erratic when cold".
NAD CD player repair video

https://www.youtube.com/watch?v=jAehSoTmLGY
12/02/2020

Jekyll without plugins

https://jekyllcodex.org/without-plugins/
DLF Levels of Born-Digital Access

https://osf.io/af4eq/
04/02/2020

Firefox web archives add-on

https://github.com/dessant/web-archives
28/01/2020

Accessing Digital Archives Guide, UNC Library

https://guides.lib.unc.edu/accessdigitalarchives
Geolocate URL

Command-line:
https://www.maketecheasier.com/ip-address-geolocation-lookups-linux/
Python:
https://pypi.org/project/geoip2/
Uses MaxMind databases.
BUT getting IP address from URL is difficult in python, so perhaps better to use bash:
https://linuxhandbook.com/find-website-ip-address-linux/

Windows registry code for Pandoc context menu item

Windows Registry Editor Version 5.00

[HKEY_CLASSES_ROOT\*\shell\mkd2doc]
[HKEY_CLASSES_ROOT\*\shell\mkd2doc\command]
@="\"F:\\Pandoc\\pandoc.exe\" -s -S --ascii -N --toc-depth=2 \"%1\" -o \"%1.docx\""

Then save as pandoc.reg.
22/01/2020

Changed behaviour of Python collectionsin Python 3.8

This may be relevant to Iromlab or OmSipCreator:
https://docs.python.org/3/whatsnew/3.8.html#collections
Example:
https://github.com/kieranjol/IFIscripts/commit/c6eedd9ec0821b7108f7a93f81bf043a6cb53d20
(Via Twitter)
18/01/2020

PinePhone

https://en.wikipedia.org/wiki/PinePhone
16/01/2020

Everything I know about SSDs

http://kcall.co.uk/ssd/index.html
Task failed successfully pin


https://www.hellovoid.online/product/task-failed-successfully-enamel-pin-pre-order
09/01/2020

Low disk space on boot partition

https://forums.linuxmint.com/viewtopic.php?t=265077
Solved by running following codeblock (as described here):
OLDCONF=$(dpkg -l|grep "^rc"|awk '{print $2}')
CURKERNEL=$(uname -r|sed 's/-*[a-z]//g'|sed 's/-386//g')
LINUXPKG="linux-(image|headers|ubuntu-modules|restricted-modules)"
METALINUXPKG="linux-(image|headers|restricted-modules)-(generic|i386|server|common|rt|xen)"
OLDKERNELS=$(dpkg -l|awk '{print $2}'|grep -E $LINUXPKG |grep -vE $METALINUXPKG|grep -v $CURKERNEL)
YELLOW="\033[1;33m"
RED="\033[0;31m"
ENDCOLOR="\033[0m"
sudo apt-get purge $OLDKERNELS

24/12/2019

The 2010s were supposed to bring the ebook revolution. It never quite came.

https://www.vox.com/culture/2019/12/23/20991659/ebook-amazon-kindle-ereader-department-of-justice-publishing-lawsuit-apple-ipad
15/12/2019

Microsoft Access: The Database Software That Won’t Die

https://medium.com/young-coder/microsoft-access-the-zombie-database-software-that-wont-die-5b09e389c166
12/12/2019

On Implementation of Open Standards in Software: To What Extent Can ISO Standards be Implemented in Open Source Software?

Some interesting observations on JPEG 2000:
http://www.diva-portal.org/smash/get/diva2:925474/FULLTEXT01.pdf
12/11/2019

Search Github gists by user

curl user:bitsgalore

04/11/2019

Two New Tools that Tame the Treachery of Files

https://blog.trailofbits.com/2019/11/01/two-new-tools-that-tame-the-treachery-of-files/
02/11/2019

EML attachments in O365 - a recipe for phishing

https://isc.sans.edu/forums/diary/EML+attachments+in+O365+a+recipe+for+phishing/25474/
01/11/2019

xkcd Earth Temperature Timeline

https://xkcd.com/1732/
31/10/2019

Manage Docker as a non-root user

https://docs.docker.com/install/linux/linux-postinstall/
30/10/2019

Linked multisession discs (CD-ROM)

http://www.gburner.com/online-help/what-is-multisession-disc.htm

"When you add more files in a subsequent session, a complete new file system is written for the new session, but it can include references to files recorded in the previous session; this is known as linked multisession."

History:
https://web.archive.org/web/20050211005128/http://www.roxio.com/en/support/cdr/multisessionhistory.html
28/10/2019

KPN Secure File Transfer

https://filetransfer.kpn.com/
23/10/2019

Location for AppImage files

Official recommendation is to use folder in home directory (see https://askubuntu.com/questions/1092742/where-should-i-put-appimages-files), but since homedir on home PC is on slow HD whereas OS + all other software is on fast SDD, I created a directory under root:
/Applications/
Then move AppImage files there.
16/10/2019

List of web archives

https://erichennekam.blogspot.com/2014/07/lijst-webarchieven-in-de-wereld-want.html
14/10/2019

Levels of Born-Digital Access

https://docs.google.com/document/d/1N1fG4AgyBEJISc3tk5rWAc_3ZYdDbdVK4_Dbi_TusYQ/edit
13/10/2019

Computer Files Are Going Extinct

https://onezero.medium.com/the-death-of-the-computer-file-doc-43cb028c0506
08/10/2019

Why most academic journals are following outdated publishing practices

https://blog.scholasticahq.com/post/why-academic-journals-are-following-outdated-publishing-practices/
04/10/2019

Running Iromlab wrapped commands manually

For testing only:
C:\Users\jkn010\AppData\Roaming\Python\Python36\site-packages\iromlab\tools\libcdio\win64\cd-info.exe -C H: --no-header --no-device-info --no-disc-mode --no-cddb --dvd > cd-info.log

"C:\Program Files\dBpoweramp\BatchRipper\Loaders\Nimbie\Pre-Batch\Pre-Batch.exe" --drive="H"  --logfile="prebatch.log" --passerrorsback="prebatcherrors.log"

"C:\Program Files\dBpoweramp\BatchRipper\Loaders\Nimbie\Load\Load.exe" --drive="H" --rejectifnodisc  --logfile=load.log" --passerrorsback="loaderrors.log"

"C:\Program Files (x86)\Smart Projects\IsoBuster\IsoBuster.exe" /d:H: /ei:test-h.iso /et:u /ep:oea /ep:npc /c /m /nosplash /s:1 /l:ib-h.log
01/10/2019

Software setup for Device Side FC5025 floppy controller, Linux


Compile and install the software according to official documentation


In file /etc/udev/rules.d/025_fc5025.rules, replace the two occurrences of SYSFS with ATTRS


Run:
sudo usermod -a -G floppy $USER


Reboot the machine


Tested with Linux Mint 18.3 (Sylvia), equivalent to Ubuntu Xenial.
Sources: https://groups.google.com/forum/#!topic/bitcurator-users/K1BPIbdKoOY/discussion + email correspondence with Device Side Data (the creator of the FC5025).
28/09/2019

OfficeToPDF


OfficeToPDF is a command line utility that converts Microsoft Office 2003, 2007, 2010, 2013 and 2016 documents from their native format into PDF using Office's in-built PDF export features.

https://github.com/cognidox/OfficeToPDF
27/09/2019

QEMU QED

"ffmprovisr for QEMU":
https://eaasi.gitlab.io/qemu-qed/
25/09/2019

OpenShot video editor

https://www.openshot.org/
(Used this for iPRES video)
Kdenlive video editor

https://kdenlive.org/en/
(Used this for earlier video, I think).
Copy of Apache-related files on Linux machine

Directories /etc/apache2, /var/www and file etc/hosts copied to folder backup-webserver on backup disk BAKWA. Copied using:


sudo rsync -avhl /var/www/ ./var/www


sudo rsync -avhl /etc/apache2/ ./etc/apache2


sudo rsync -avhl /etc/hosts ./etc/


To be restored after reinstall.
ATA Secure Erase (erase SSD disk)

https://ata.wiki.kernel.org/index.php/ATA_Secure_Erase
23/09/2019

MIT Digital Media Transfer Kits

https://libguides.mit.edu/digmediatransfer
How to Sync Microsoft OneDrive with Linux

https://www.maketecheasier.com/sync-onedrive-linux/
20/09/2019

U.S. National Archives Digital Preservation Framework

https://github.com/usnationalarchives/digital-preservation
15/09/2019

Learning Machine Learning

https://cloud.google.com/products/ai/ml-comic-1/
11/09/2019

DiscImageCreator

https://github.com/saramibreak/DiscImageCreator
(via Twitter)
06/09/2019

Appendix A: Tables of File Formats | National Archives

https://www.archives.gov/records-mgmt/policy/transfer-guidance-tables.html
27/08/2019

Microservices in Audiovisual Archives


This document describes and examines strategies for designing lightweight microservice environments for the processing of digital, file-based, audiovisual data within an archive.

http://journal.iasa-web.org/pubs/article/view/70
22/08/2019

Fix Bless "not enough free space on the device to save file" errors


Close Bless, and open preferences file (/home/johan/.config/bless/preferences.xml) in a text editor.
Set temp dir by editing pref element with ByteBuffer.TempDir name attribute
Add closing </preferences> tag and save the file. File should look like below:
<preferences>
    <pref name="ByteBuffer.TempDir">/tmp/Bless</pref>
    <pref name="Default.NumberBase">Hexadecimal</pref>
    <pref name="Undo.Actions">100</pref>
    <pref name="View.Toolbar.Show">True</pref>
    <pref name="Undo.Limited">False</pref>
    <pref name="View.Statusbar.Show">True</pref>
    <pref name="Session.RememberWindowGeometry">True</pref>
    <pref name="Default.Layout.UseCurrent">False</pref>
    <pref name="Session.RememberCursorPosition">True</pref>
    <pref name="Session.AskBeforeLoading">False</pref>
    <pref name="View.Statusbar.Selection">True</pref>
    <pref name="Tools.Statistics.Show">False</pref>
    <pref name="View.Statusbar.Offset">True</pref>
    <pref name="Tools.ConversionTable.LEDecoding">False</pref>
    <pref name="Default.EditMode">Insert</pref>
    <pref name="Tools.ConversionTable.Show">True</pref>
    <pref name="Highlight.PatternMatch">True</pref>
    <pref name="Undo.KeepAfterSave">Memory</pref>
    <pref name="Session.LoadPrevious">True</pref>
    <pref name="View.Statusbar.Overwrite">True</pref>
    <pref name="Default.Layout.File">
</preferences>

Make the file read-only:
chmod 0444 /home/johan/.config/bless/preferences.xml


Done!
Source here
Update: this didn't quite work, but a workaround is to enter the location of the temp dir (/tmp/Bless) directly in Bless' user interface as a text string (so don't use the file navigation widgets!).
16/08/2019

Philology and the digital writing process

https://filologiaunlp.files.wordpress.com/2018/06/ries_philology-and-the-digital-writing-process_2017.pdf
14/08/2019

JP2 images in Tika regression corpus

http://162.242.228.174/share/jp2.tgz
13/08/2019

Going Commando - Put Down The Mouse

https://blog.codinghorror.com/going-commando-put-down-the-mouse/
Mouseless Computing

https://weblogs.asp.net/jongalloway/Mouseless-Computing
Hack Attack: Mouse-less Firefox

https://lifehacker.com/hack-attack-mouse-less-firefox-139495
09/08/2019

Python reverse geocode


Reverse Geocode takes a latitude / longitude coordinate and returns the country and city.

https://pypi.org/project/reverse-geocode/
03/08/2019

Verloren jouw gegevens


Bron: https://twitter.com/Eijsbouts/status/1157591377624150016
31/07/2019

1995: kwart grote bedrijven op Internet

https://twitter.com/rutger_/status/1156629656533110787 (archived)
Delpher link: https://resolver.kb.nl/resolve?urn=ABCDDD:010870971:mpeg21:a0117
Gebruiken als context bij xxLINK presentatie!
Install Android on VirtualBox

29/07/2019

Install Android on VirtualBox

https://www.howtogeek.com/164570/how-to-install-android-in-virtualbox/
Then in VirtualBox change display option "Graphics Controller" to VBoxVGA, and enabled 3D acceleration, as per here.
Home Assistant

https://www.home-assistant.io/
27/07/2019

Renoise audio configuration

Added following lines to /etc/security/limits.conf, as per here:
johan - rtprio 99
johan - nice -10

11/07/2019

deja-dup / duplicity keeps asking for encryption password

See:
https://askubuntu.com/questions/462085/deja-dup-repeatedly-asks-encryption-password
Tried:

Re-install of duplicity
Changed ownership of a few dirs in home that were owned by root.

Start backup from terminal:
export DEJA_DUP_DEBUG=1
deja-dup --backup

Result: backup appears to be created, but after verification stage deja-dup asks for password again. Tail end of debug output:
DUPLICITY: .     self.gpg_failed()
DUPLICITY: .   File "/usr/lib/python2.7/dist-packages/duplicity/gpg.py", line 272, in gpg_failed
DUPLICITY: .     raise GPGError(msg)
DUPLICITY: .  GPGError: GPG Failed, see log below:
DUPLICITY: . ===== Begin GnuPG log =====
DUPLICITY: . gpg: WARNING: "--no-use-agent" is an obsolete option - it has no effect
DUPLICITY: . gpg: AES256 encrypted data
DUPLICITY: . gpg: encrypted with 1 passphrase
DUPLICITY: . gpg: decryption failed: Bad session key
DUPLICITY: . ===== End GnuPG log =====
DUPLICITY: . 
DUPLICITY: . 

DUPLICITY: ERROR 31 GPGError
DUPLICITY: . GPGError: GPG Failed, see log below:
DUPLICITY: . ===== Begin GnuPG log =====
DUPLICITY: . gpg: WARNING: "--no-use-agent" is an obsolete option - it has no effect
DUPLICITY: . gpg: AES256 encrypted data
DUPLICITY: . gpg: encrypted with 1 passphrase
DUPLICITY: . gpg: decryption failed: Bad session key
DUPLICITY: . ===== End GnuPG log =====
DUPLICITY: . 

10/07/2019

nwipe - securely erase disks (dban fork)

https://linux.die.net/man/1/nwipe
08/07/2019

Archaeology of the Amsterdam digital city; why digital data are dynamic and should be treated accordingly

https://www.tandfonline.com/doi/full/10.1080/24701475.2017.1309852
02/07/2019

Toward Environmentally Sustainable Digital Preservation

https://dash.harvard.edu/handle/1/40741399
25/06/2019

Deja-dup filling up home dir

After attaching a large external HD + including it in the backup scheme, deja-dup eats up all space of main HD. Cause: deja-dup writes some metadata and manifest files to home dir at:
~/.cache/deja-dup/

These files become very large (here: > 18 GB) which results in running out of disk space. Apparently causes problems for lots of deja-dup users, e.g. here, here. This post suggests to solve this by creating a symlink to  ~/.cache/deja-dup/ on another disk with sufficient space:
mkdir /media/johan/BAKWA/.deja-dup-cache
mv ~/.cache/deja-dup/* /media/johan/BAKWA/.deja-dup-cache/
rmdir ~/.cache/deja-dup
ln -sf /media/johan/BAKWA/.deja-dup-cache ~/.cache/deja-dup

UPDATE: doesn't work, files are still written to home dir!! Interim solution: exclude external drive from deja-dup backup scheme, and back it up manually with rsync (no incremental backup though!).
20/06/2019

Format USB drive as ext4

List partitions:
df -h

Result:
Filesystem      Size  Used Avail Use% Mounted on
udev            3,9G     0  3,9G   0% /dev
tmpfs           789M  9,5M  780M   2% /run
/dev/sda1       227G  202G   14G  94% /
tmpfs           3,9G   34M  3,9G   1% /dev/shm
tmpfs           5,0M  4,0K  5,0M   1% /run/lock
tmpfs           3,9G     0  3,9G   0% /sys/fs/cgroup
cgmfs           100K     0  100K   0% /run/cgmanager/fs
tmpfs           789M   32K  789M   1% /run/user/1000
/dev/sdb1       1,9T  144M  1,9T   1% /media/johan/Elements4

So in this case we need to format /dev/sdb1. Unmount the disk:
sudo umount /dev/sdb1

Format as ext4:
sudo mkfs.ext4 /dev/sdb1

Change generic label to WEBARCH:
sudo e2label /dev/sdb1 WEBARCH

Done!
Copy directory tree with rsync

 #!/bin/bash
 # Script must be run as root!

 sourceDir=/media/johan/Elements4/webarcheologie
 destDir=/media/johan/WEBARCH/
 rsync -avhl --dry-run $sourceDir $destDir

Copy homedir:
#!/bin/bash
# Script must be run as root!

sourceDir=~
destDir=/media/johan/BAKWA/homedir-25022020/
rsync -avhl $sourceDir $destDir

17/06/2019

Filesystem Hierarchy Standard

https://www.linuxjournal.com/content/filesystem-hierarchy-standard
11/06/2019

Researcher, Don’t Make Your Readers Scream!

https://www.cl.cam.ac.uk/~lp15/Pages/Scream.html
07/06/2019

Quick MAME/MESS Philips CD-I Tutorial (Mame 0.172)

https://forums.launchbox-app.com/topic/29631-quick-mamemess-philips-cd-i-tutorial-mame-0-172/
30/05/2019

Reader Privacy: The New Shape of the Threat (Clifford Lynch)

https://publications.arl.org/16ivjbv/ (PDF link)
27/05/2019

LaTEX setup notes

First install the following packages:
sudo apt install texlive-latex-extra
sudo apt-get install texlive-bibtex-extra biber
sudo apt-get install texlive-fonts-recommended

Then download the OpenSans package here. Install using following steps:

Copy doc/, fonts/, source/, and tex/ directories to /etc/texmf directory
Run mktexlsr to refresh the file name database and make TEX aware of the new files.
Run sudo updmap -sys --enable Map=opensans.map to make Dvips, dvipdf and pdfTEX aware of the new fonts.

26/05/2019

Digital Physical Carrier Illustrations

https://blog.matthewburgess.net/2019/05/digital-physical-carrier-illustrations.html
22/05/2019

Corrupt a file - The file corrupter you were looking for!

https://corrupt-a-file.net/
18/05/2019

Manuals HP ProDesk

https://support.hp.com/us-en/product/hp-prodesk-400-g3-microtower-pc/7638325/manuals
16/05/2019

Regex to convert smart quotes with regular ones (and vice-versa)

https://gist.github.com/zerolab/1633661
Convert dumb quotes to smart quotes in Python

https://gist.github.com/davidtheclark/5521432
Even easier, use SmartyPants:
https://pypi.org/project/smartypants/
09/05/2019

Library of Congress Web Archive Data Sets

https://labs.loc.gov/experiments/webarchive-datasets/
01/05/2019

Unraveling the JPEG

https://parametric.press/issue-01/unraveling-the-jpeg/
20/04/2019

Floppy disks are like Jesus


15/04/2019

ArchiveBox


ArchiveBox takes a list of website URLs you want to archive, and creates a local, static, browsable HTML clone of the content from those websites (it saves HTML, JS, media files, PDFs, images and more).

https://archivebox.io/
02/04/2019

Text in PDF has no Unicode mapping


Short of AI, your best bet is to run OCR (tesseract) on these files.

https://lists.apache.org/thread.html/d25f20eda1c2094f0902e7b7092d829a64085b3b87aad2b8b346a453@%3Cuser.tika.apache.org%3E
23/03/2019

Identification of audio CD on Linux

Use cd-discid:
cd-discid /dev/sr1

Result:
b608ed0f 15 150 8656 19406 37656 48025 58358 71683 77998 90546 103443 117153 120751 132154 144223 157688 2287

Lookup in freedb using:
http://freedb.freedb.org/~cddb/cddb.cgi?cmd=cddb+query+b608ed0f+15+150+8656+19406+37656+48025+58358+71683+77998+90546+103443+117153+120751+132154+144223+157688+2287&hello=user+hostname+program+version&proto=3
Result:
200 rock b608ed0f Der Plan / Unkapitulierbar

Full record:
http://www.freedb.org/freedb/rock/b608ed0f
# xmcd
#
# Track frame offsets: 
#        150
#        8656
#        19406
#        37656
#        48025
#        58358
#        71683
#        77998
#        90546
#        103443
#        117153
#        120751
#        132154
#        144223
#        157688
#
# Disc length: 2287 seconds
#
# Revision: 0
# Processed by: cddbd v1.5.2PL0 Copyright (c) Steve Scherf et al.
# Submitted via: ExactAudioCopy v0.99pb5
#
DISCID=b608ed0f
DTITLE=Der Plan / Unkapitulierbar
DYEAR=2017
DGENRE=Electronic
TTITLE0=Wie der Wind weht
TTITLE1=Lass die Katze stehn!
TTITLE2=Man leidet herrlich
TTITLE3=Grundrecht
TTITLE4=Es heißt: die Sonne
TTITLE5=Gesicht ohne Buch
TTITLE6=Stille hören
TTITLE7=Flohmarkt der Gefühle
TTITLE8=Der Herbst
TTITLE9=Körperlos im Cyberspace
TTITLE10=Zu Besuch bei N. Senada
TTITLE11=Wie schwarz ist ein Rabe?
TTITLE12=Come Fly With Me
TTITLE13=Was kostet der Austritt?
TTITLE14=Die Hände des Astronauten
EXTD=
EXTT0=
EXTT1=
EXTT2=
EXTT3=
EXTT4=
EXTT5=
EXTT6=
EXTT7=
EXTT8=
EXTT9=
EXTT10=
EXTT11=
EXTT12=
EXTT13=
EXTT14=
PLAYORDER=

Python: cddb-py; Python 3 port here.
See also CDDB.
08/03/2019

Update forked Git repository

From here:
git fetch upstream
git checkout master
git rebase upstream/master
git push -f origin master

06/03/2019

ExifTool: report custom image properties to CSV file

Suppose we want to extract the Jpeg2000:NumberOfComponents field for each JP2 image:
exiftool -csv -Jpeg2000:NumberOfComponents /media/johan/Elements4/test/*.jp2 > exif.csv

Result:
SourceFile,NumberOfComponents
/media/johan/Elements4/test/HS-19640508-001.jp2,3
/media/johan/Elements4/test/HS-19640508-002.jp2,3
::

05/03/2019

ImageMagick: resize all images in directory to fixed width

mogrify -resize 1014 *.jpg

(Note: this changes the images in-place, so make a copy of the original images before doing this).
12/02/2019

ImageMagick: fix 'convert: not authorized'on PDF

https://alexvanderbist.com/posts/2018/fixing-imagick-error-unauthorized
10/02/2019

Emulation resources list (Ethan Gates)

https://github.com/EG-tech/emulation-resources
29/01/2019

Big List of Naughty Strings


The Big List of Naughty Strings is an evolving list of strings which have a high probability of causing issues when used as user-input data.

https://github.com/minimaxir/big-list-of-naughty-strings
03/01/2019

Twitter search advanced guide

https://espirian.co.uk/twitter-search-advanced-guide/
22/12/2018

Mounting Fritz.NAS under Linux Mint

Below instructions are for a fresh install. Based on:
https://dominicpratt.de/fritz-nas-unter-debianubuntu-einbinden/


Open fstab in text editor as sudo:
sudo xed /etc/fstab


Add folllowing line to bottom (last line of file must be empty):
//192.168.178.1/FRITZ.NAS /media/fritzbox cifs credentials=/etc/samba/auth,vers=1.0,uid=1000,gid=1000 0 


Create the mount directory:
sudo mkdir -p /media/fritzbox 


Create file /etc/samba/auth:
sudo touch /etc/samba/auth


Edit as sudo:
sudo xed  /etc/samba/auth


Add username and password entries (must be FritzNAS uname + pwd, not the FritzBox ones!):
username=johan
password=dfh3476fh8((77&&


It might be necessary to install the cifs-utils and samba packages (it seems cifs-utils is already part of the default Linux Mint install):
sudo apt-install cifs-utils
sudo apt install samba 


Finally mount:
sudo mount -a


Done!
21/12/2018

Linux Mint: new install resuts in Grub Prompt when booting

https://forums.linuxmint.com/viewtopic.php?t=217509
Deark


A utility for file format and metadata analysis, data extraction, and image format decoding

https://github.com/jsummers/deark
24/192 Music Downloads ... and why they make no sense

https://people.xiph.org/~xiphmont/demo/neil-young.html
30/11/2018

mh virtual tape & library system.

https://github.com/markh794/mhvtl
Install script for Ubuntu 16.04:
https://gist.github.com/hrchu/3eb1c0aa9994df0328037fff04cd889d
Then run using:
sudo /etc/init.d/mhvtl start

24/11/2018

Tkinter bitmaps in Ubuntu (Python)

<https://stackoverflow.com/a/25223352/1209004
E.g.:
def main():
    """Main function"""

    appDir = get_main_dir()
    root = tk.Tk()
    root.iconphoto(True, tk.PhotoImage(file=os.path.join(appDir, 'icon.png')))
    myGUI = tapeimgrGUI(root)

24/10/2018

Bash: output to array, which is then parsed

# Get tape status, output to array (split at newline)
IFS=$'\n' tapeStatus=$(mt -f $TAPEnr status)

# Parse file number and block number from status output 
for item in ${tapeStatus[*]}
do
    if [[ $item == *"file number"* ]]; then
        # Split at equal sign, 2nd item is value
        tmp=$(echo $item | cut -f2 -d=)
        # Strip whitespace
        fileNumber="$(echo -e "${tmp}" | tr -d '[:space:]')"
        #echo $fileNumber
    fi

    if [[ $item == *"block number"* ]]; then
        # Split at equal sign, 2nd item is value
        tmp=$(echo $item | cut -f2 -d=)
        # Strip whitespace
        blockNumber="$(echo -e "${tmp}" | tr -d '[:space:]')"
        #echo $blockNumber
    fi

done

20/10/2018

Oxford Common File Layout


This Oxford Common File Layout (OCFL) specification describes an application-independent approach to the storage of digital information in a structured, transparent, and predictable manner. It is designed to promote long-term object management best practices within digital repositories.

https://ocfl.io/
18/10/2018

Hex Editing for Archivists

http://www.av-rd.com/knowhow/
12/10/2018

Camelot: PDF Table Extraction for Humans (Python)

https://github.com/socialcopsdev/camelot/
01/10/2018

Update nodejs

https://askubuntu.com/questions/711834/unable-to-update-node-js-keeps-returning-to-old-version-ubuntu-15-04
try dat

https://try-dat.com/
25/09/2018

ReMarkable MarkDown editor

https://remarkableapp.github.io/index.html
Preservation Planning for Emerging Formats at the British Library

https://osf.io/65p7m/
14/09/2018

Docker files consume excessive amounts of disk space

See also moby/moby#21925.
E.g.:
sudo du -hx --max-depth=1 /var/lib

Result contains this entry:
25G	/var/lib/docker

There are probably more elegant/subtle ways to handle this, see e.g. https://lebkowski.name/docker-volumes/
Solution/workaround

Uninstall docker:
sudo apt-get remove docker docker-engine docker.io

Delete files:
sudo rm -rf /var/lib/docker

10/09/2018

BL Emerging Formats project


The Library’s ‘Emerging Formats’ project is focused on UK publications created for the mobile web, as interactive narratives or in database format.

https://britishlibrary.recruitment.northgatearinso.com/birl/pages/vacancy.jsf?latest=01001612

Caylin Smith and Ian Cooke report on the Emerging Formats project, which is investigating the collection management needs of published works that are created with digital formats that have significant software and hardware dependencies. They discuss the collection management challenges of these format types within the framework of UK NPLD.

http://journals.sagepub.com/doi/full/10.1177/0955749018785836
24/08/2018

Empty Trash on Linux machine from terminal

This works if Trash contains items that swere put there as superuser:
sudo rm -rf ~/.local/share/Trash/*

16/08/2018

How To Install WordPress with LAMP on Ubuntu 16.04

https://www.digitalocean.com/community/tutorials/how-to-install-wordpress-with-lamp-on-ubuntu-16-04
Use this to import kbresearch blog; then export to static site using:
https://wordpress.org/plugins/static-html-output-plugin/
06/08/2018

Digital transformation at Wellcome Collection

https://stacks.wellcomecollection.org/digital-transformation-at-wellcome-collection-639fb177aad6
27/07/2018

Search by file extension on Github


filename:ext extension:ext where ext is the extension you're interested in. You need both the filename and extension keywords to filter it down to only potential files of interest.

https://twitter.com/NKrabben/status/1022575556209074220
Example:
https://github.com/search?q=filename%3Awq1+extension%3Awq1
26/07/2018

Smallest possible […] file


This repository aims to collect the smallest possible syntactically valid files in different programming/scripting/markup languages.

https://github.com/mathiasbynens/small
25/07/2018

VisiData


VisiData is an interactive multitool for tabular data. It combines the clarity of a spreadsheet, the efficiency of the terminal, and the power of Python, into a lightweight utility which can handle millions of rows with ease.

http://visidata.org/
09/07/2018

Disk wiping and data forensics: Separating myth from science

https://www.techrepublic.com/article/disk-wiping-and-data-forensics-separating-myth-from-science/
30/06/2018

Excel Unusual


the home of the most unique Microsoft Excel animated spreadsheets

http://www.excelunusual.com/
29/06/2018

Hackmd.io

https://hackmd.io/
28/06/2018

It's Not Easy Being Green(e): Digital Preservation in the Age of Climate Change

https://scholarsphere.psu.edu/concern/generic_works/bvq27zn11p
23/06/2018

PREMIS/METS for scalability

https://wiki.archivematica.org/PREMIS/METS_for_scalability
17/06/2018

Markdown and Visual Studio Code

https://code.visualstudio.com/Docs/languages/markdown
Build an Amazing Markdown Editor Using Visual Studio Code and Pandoc

http://thisdavej.com/build-an-amazing-markdown-editor-using-visual-studio-code-and-pandoc/
15/06/2018

How to Measure Static Electricity

https://www.wikihow.com/Measure-Static-Electricity
11/06/2018

gedit on Windows

Install in MINGW:
pacman -S mingw-w64-x86_64-gedit

Add external plugin:
https://stackoverflow.com/questions/39360149/adding-external-plug-ins-to-gedit-in-windows
Get plugins here:
https://wiki.gnome.org/Apps/Gedit/ThirdPartyPlugins-v3.0
28/05/2018

Swisscows search engine

https://swisscows.com/
22/05/2018

Installation of Ace

If ELIFECYCLE / puppeteer error happens, try this (source):
sudo npm install @daisy/ace -g -unsafe-perm=true --allow-root

BUT ace now fails on this (installing Chrome doesn't help).
20/05/2018

Proselint


Our goal is to aggregate knowledge about best practices in writing and to make that knowledge immediately accessible to all authors in the form of a linter for prose.

https://github.com/amperser/proselint/
18/05/2018

Memento Tracer

http://tracer.mementoweb.org/
17/05/2018

The Importance of EPUB and the Need for EPUB 4

https://w3c.github.io/publ-bg/docs/EPUB4_business_case.html
11/05/2018

What’s in a Name? On ‘Meaningfulness’ and Best Practices in Filenaming within the LAM Community

http://journal.code4lib.org/articles/13438
10/05/2018

Microsoft Office Supported File formats


Office 2016 (archived)
Office 2010 (archived)
Office 2007 (archived)

Possibly more here.
07/05/2018

Ace Accessibility Checker for EPUB

https://daisy.github.io/ace/
Web service based on Ace:
http://bacc.dzb.de/
03/05/2018

List of open workflows and resources for A/V archiving

https://github.com/amiaopensource/open-workflows
26/04/2018

Integration of nonharvested web data into an existing web archive

http://netarkivet.dk/wp-content/uploads/IntegrationOfNonHarvestedData.pdf
Read Tape Contents (Linux)

https://www.linuxquestions.org/questions/linux-newbie-8/read-tape-contents-944371/
17/04/2018

Ten simple rules for structuring papers

http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005619
12/04/2018

A successful Git branching model

http://nvie.com/posts/a-successful-git-branching-model/
Wikidata portal project

https://github.com/WikiDP/wikidp-portal
03/04/2018

Apache Web Server on Ubuntu 16.04

https://www.digitalocean.com/community/tutorials/how-to-install-the-apache-web-server-on-ubuntu-16-04
Restrict Apache Access to Localhost Only

In config file ports.conf, change this line:
Listen 80

into this:
Listen 127.0.0.1:80

See:
https://serverfault.com/questions/276963/make-apache-only-accessible-via-127-0-0-1-is-this-possible/276968#276968
Setting up multiple sites:
https://www.liberiangeek.net/2015/07/how-to-enable-and-run-multiple-websites-using-apache2-on-ubuntu-15-04/
28/03/2018

Script Ahoy


Community resource intended to provide helpful one-liners and script code specifically drawn from real-life examples in archives and libraries

https://dd388.github.io/crals/
Create static archived version of Wordpress blog

wget --recursive --no-clobber --span-hosts --page-requisites \
     --convert-links --no-parent -w 5 --random-wait \
     http://blog.kbresearch.nl >>wget.log 2>&1

This doesn't quite work the way it should:

If we leave out --span-hosts external stylesheets etc. are ignored, even if --page-requisites is used (don't want that)!
If we include --span-hosts externally referenced pages/sites are scraped as well (don't want that either!)

See also https://gist.github.com/dannguyen/03a10e850656577cfb57
Better approach:


Scrape one single page:
wget --page-requisites --span-hosts --convert-links --adjust-extension -w 5 --random-wait http://blog.kbresearch.nl/2015/07/07/why-pdfa-validation-matters-even-if-you-dont-have-pdfa/ >>$logFile 2>&1


This gives us the domains used for individual page resources, which we can subsequently feed into --domains. After some fiddling (we don't want to harvest +60 gravatar subdomains) this looks reasonable:
#!/bin/bash

url=http://blog.kbresearch.nl
domains=blog.kbresearch.nl,wp.com,researchkb.files.wordpress.com,googleapis.com,gstatic.com

logFile=wget.log
wget --mirror --page-requisites --span-hosts --convert-links --adjust-extension -w 5 --random-wait --domains=$domains $url >>$logFile 2>&1

24/03/2018

Difficulties of Timestamping Archived Web Pages

https://arxiv.org/abs/1712.03140
22/03/2018

swMATH


swMATH is a freely accessible, innovative information service for mathematical software. swMATH not only provides access to an extensive database of information on mathematical software, but also includes a systematic linking of software packages with relevant mathematical publications.

http://www.swmath.org/
17/03/2018

Windows previous versions documentation

https://docs.microsoft.com/en-us/previous-versions/windows/
12/03/2018

Wikidata for digital preservation portal

http://wikidp.org/
09/03/2018

Search files in UK web archive by magic pattern

See this thread on digipres.club for some context:
https://digipres.club/@joe/99650486509645352
Search URL:
https://www.webarchive.org.uk/shine/search?page=1&invert=&facet.fields=crawl_year&invert=&invert=&facet.fields=public_suffix&invert=&invert=&invert=&invert=&action=search&query=content_ffb:%220baddeed%22&totalCount=totalCount&sort=crawl_date&order=asc
Is Open Science ready for software containers?


One of our goals is to publish researcher's data, code, and executable Linux container all as files in a version controlled Dat repository. For this to be useful, a person should be able to execute these Linux environments (aka containers) anywhere

https://blog.datproject.org/2018/01/26/challenges-of-decentralized-hpc-containerization/
07/03/2018

Install OwnCloud desktop on Linux Mint 18.3

Instructions here, Ubuntu 16.04.
If updating results in warnings about package authentication, follow steps below:
owncloud/client#5287 (comment)
06/03/2018

Remove all XMP tags from a TIFF, except xmp-tiff ones

exiftool -xmp:all= "-all:all<xmp-tiff:all" MMKB19_000004012_00002_master.tiff

27/02/2018

Set non-standard maximum line length in pep8

Use --max-line-length option, e.g.:
pep8 --max-line-length=120 ~/omSipCreator/omSipCreator > pep8.txt

16/02/2018

Longevity of Optical Disc Media: Accelerated Ageing Predictions and Natural Ageing Data

https://www.degruyter.com/view/j/rest.2017.38.issue-3/res-2016-0032/res-2016-0032.xml?format=INT
COMPACT DISC SERVICE LIFE: AN INVESTIGATION OF THE ESTIMATED SERVICE LIFE OF PRERECORDED COMPACT DISCS (CD-ROM)

https://www.loc.gov/preservation/resources/rt/CDservicelife_rev.pdf
CD-ROM Longevity Research at LoC

https://www.loc.gov/preservation/scientists/projects/cd_longevity.html
CD-R and DVD-R RW Longevity Research at LoC

https://www.loc.gov/preservation/scientists/projects/cd-r_dvd-r_rw_longevity.html
15/02/2018

Python Macros in OpenOffice / LibreOffice

http://christopher5106.github.io/office/2015/12/06/openoffice-libreoffice-automate-your-office-tasks-with-python-macros.html
14/02/2018

Write Markdown with 8 Exceptional Open Source Editors

https://www.ossblog.org/markdown-editors/
06/02/2018

Discard unstaged changes to Git repo

git checkout -- .

(see also stackoverflow)
05/02/2018

PREMIS in METS Toolbox

validate METS file against best practices:
http://pim.fcla.edu/validate
Schematron rules:
http://pim.fcla.edu/resources
31/01/2018

Siegfried format counts

sf -csv t/images | cut -d ',' -f 6 | sort | uniq -c | sort -r

Result:
  8 x-fmt/390
  7 fmt/645
  5 fmt/41
  5 fmt/101
  4 fmt/43
  3 x-fmt/62
  3 x-fmt/263
  3 x-fmt/111
  3 fmt/44
  2 fmt/661
  2 fmt/5
  2 fmt/17
 28 UNKNOWN
  1 x-fmt/92
  ::
  etc

(Source: Nick Krabbenhöft)
How to update a GitHub forked repository

https://stackoverflow.com/a/7244456
Create Windows context menu item

https://gist.github.com/bitsgalore/7c5da72277557b608c94
ExifTool sample files

https://sourceforge.net/p/exiftool/code/ci/master/tree/t/images/
Wine installation on Linux Mint 18.3

Not working, problem seems to correspond to issue here:
https://forums.linuxmint.com/viewtopic.php?f=47&t=260925
24/01/2018

Finding and installing packages in MSYS2

Create/update package database:
pacman -Fy

Result:
:: Synchronizing package databases...
 mingw32                    2.4 MiB  2.97M/s 00:01 [#####################] 100%
 mingw32.sig               96.0   B  0.00B/s 00:00 [#####################] 100%
 mingw64                    2.4 MiB  1695K/s 00:01 [#####################] 100%
 mingw64.sig               96.0   B  0.00B/s 00:00 [#####################] 100%
 msys                     855.8 KiB  4.24M/s 00:00 [#####################] 100%
 msys.sig                  96.0   B  0.00B/s 00:00 [#####################] 100%

Find package name from (sub) string:
pacman -Fsx iso-info

Result:
mingw32/mingw-w64-i686-libcdio 2.0.0-1
    mingw32/bin/iso-info.exe
    mingw32/share/man/man1/iso-info.1.gz
mingw64/mingw-w64-x86_64-libcdio 2.0.0-1
    mingw64/bin/iso-info.exe
    mingw64/share/man/man1/iso-info.1.gz

Install package:
pacman -S mingw-w64-x86_64-libcdi0

Uninstall package:
pacman -R mingw-w64-x86_64-libcdi0

Source: https://github.com/msys2/msys2/wiki/Using-packages
23/01/2018

SRU: select Mac-only CD-ROMs

Query:
extent any "cdrom* cd-rom*" and annotation any "Mac*" not annotation any "Win* PC*"

Result:
http://www.kbresearch.nl/tpxslt/?xml=http://jsru.kb.nl/sru/sru?query=extent%20any%20%22cdrom*%20cd-rom*%22%20and%20annotation%20any%20%22Mac*%22%20not%20annotation%20any%20%22Win*%20PC*%22&x-collection=GGC&maximumRecords=10&xsl=http://www.kbresearch.nl/xportal/brief.xsl
SRU: select Blu-Ray discs

Query:
extent any "blu*"

Result (only 5 hits, 23/1/2018):
http://www.kbresearch.nl/tpxslt/?xml=http://jsru.kb.nl/sru/sru?query=extent%20any%20%22blu*%22&x-collection=GGC&maximumRecords=10&xsl=http://www.kbresearch.nl/xportal/brief.xsl
18/12/2017

List contents of ISO image with 7-zip

Command:
7z l -slt iso9660.iso

Result:
7-Zip [64] 9.20  Copyright (c) 1999-2010 Igor Pavlov  2010-11-18
p7zip Version 9.20 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,4 CPUs)

Listing archive: iso9660.iso

--
Path = iso9660.iso
Type = Iso
Created = 2017-06-30 18:31:33
Modified = 2017-06-30 18:31:33

----------
Path = nimbie.jpg
Folder = -
Size = 69424
Packed Size = 69424
Modified = 2017-06-30 13:23:38

Path = readme.txt
Folder = -
Size = 37
Packed Size = 37
Modified = 2017-06-30 13:25:20

UDF Bridge:
7z l -slt iso9660_udf.iso

Result:
7-Zip [64] 9.20  Copyright (c) 1999-2010 Igor Pavlov  2010-11-18
p7zip Version 9.20 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,4 CPUs)

Listing archive: iso9660_udf.iso

--
Path = iso9660_udf.iso
Type = Udf
Comment = UDF Bridge demo
Cluster Size = 2048
Created = 2017-06-30 18:31:33

----------
Path = nimbie.jpg
Folder = -
Size = 69424
Packed Size = 69632
Modified = 2017-06-30 13:23:38
Accessed = 2017-06-30 18:31:33

Path = readme.txt
Folder = -
Size = 37
Packed Size = 2048
Modified = 2017-06-30 13:25:20
Accessed = 2017-06-30 18:31:33

13/12/2017

Apache Tika vs DROID

https://twitter.com/anjacks0n/status/941020183812100096
Esp.:

Without Tika, relying on on DROID, there would have been 25,887,108 unidentified resources - mostly plain text, JS, CSS etc. Without DROID, only 464 would go unidentified, but we'd have no format-version-level information. Combining tools is crucial for web archives.

Find which file(s) are located in damaged area of ISO image

Using iso-info:
iso-info -l -i dvd-erik.iso

Result:
  d [LSN     22]      4096 Jan 01 1970 01:00:00  .
  d [LSN     22]      2048 Jan 01 1970 01:00:00  ..
  - [LSN     26] 158549392 Jul 30 2008 09:33:59  086_10B21_078v_079r.TIF
  - [LSN  77443] 158633884 Jul 30 2008 09:34:08  087_10B21_079v_080r.TIF
  - [LSN 154901] 157658880 Jul 30 2008 09:34:19  088_10B21_080v_081r.TIF
  - [LSN 231883] 157877788 Jul 30 2008 09:34:29  089_10B21_081v_082r.TIF
    ::
    ::
  - [LSN 2092850] 158203324 Jul 30 2008 09:38:31  113_10B21_105v_106r.TIF
  - [LSN 2170098] 156139844 Jul 30 2008 09:38:41  114_10B21_106v_107r.TIF

Here LSN * 2048 = offset of start of file.
11/12/2017

DDrescue --try-again switch

From the manual:

--try-again
Mark all non-trimmed and non-scraped blocks inside the rescue domain as non-tried before beginning the rescue. Try this if the drive stops responding and ddrescue immediately starts scraping failed blocks when restarted. If '--retrim' is also specified, mark all failed blocks inside the rescue domain as non-tried.

Useful if ddrescue remains stuck endlessly in "scraping failed blocks".
06/12/2017

Run .msi installer as admin

msiexec /a putty-64bit-0.70-installer.msi

23/11/2017

Useful VeraPDF command-lines

Disable PDF/A validation, only extract features:
verapdf --off --extract whatever.pdf > whatever.xml

Recursively process directory tree:
verapdf --recurse --off --extract myDir > whatever.xml

21/11/2017

Zenodo categories KBNL community


https://zenodo.org/communities/kbnl/search?page=1&size=20&q=keywords:%22digital%20preservation%22
https://zenodo.org/communities/kbnl/search?page=1&size=20&q=keywords:%22digital%20scholarship%22
https://zenodo.org/communities/kbnl/search?page=1&size=20&q=keywords:enrichment
https://zenodo.org/communities/kbnl/search?page=1&size=20&q=keywords:IPR
https://zenodo.org/communities/kbnl/search?page=1&size=20&q=keywords:%22public%20libraries%22

17/11/2017

Archivematica 1.6 Default Format Policy Registry

https://docs.google.com/spreadsheets/d/1g2vbAFBHWhsPRkNljbQBsKasMI-GCFTsQLol0cFT6js/edit#gid=0
Understanding Computer Technology

https://web.archive.org/web/20020201195007/http://www.geocities.com:80/SiliconValley/4031/

16/11/2017

Obtaining a list of all hyperlinks in an MS-Word document

https://superuser.com/questions/670324/obtaining-a-list-of-all-hyperlinks
05/11/2017

Environmental impact of academic conferences

https://www.researchgate.net/publication/318970823_Academic_conferences_urgently_need_environmental_policies
(Note: lots of DOIs in references don't resolve at all, or resolve to wrong location!)
http://onlinelibrary.wiley.com/doi/10.1111/1746-692X.12106/full
http://www.nature.com/news/a-clean-green-science-machine-1.17125?WT.mc_id=TWT_NatureNews
http://tyndall.ac.uk/sites/default/files/twp161.pdf
https://www.chemistryworld.com/opinion/cutting-the-science-travel-footprint/9567.article
01/11/2017

Use of objectCharacteristicsExtension element in PREMIS

Archivematica examples in:
https://www.loc.gov/standards/premis/examples.html
26/10/2017

Customise Pytlint error reporting for a project

https://stackoverflow.com/questions/43280486/pylint-error-message-e1101-module-lxml-etree-has-no-strip-tags-member
25/10/2017

File identification: Tika vs DROID

Paper by Andy Jackson (2012):
http://arxiv.org/pdf/1210.1714.pdf
20/10/2017

Extract URLs from PDF

https://twitter.com/andrewjbtw/status/920791293122396160
11/10/2017

Convert compressed TIFF to uncompressed TIFF

03/10/2017

For one file:
convert whatever_compressed.tif +compress whatever_uncompressed.tif

Multiple files:
#!/bin/bash


# Input and output directories
dirIn=~/tiffsDDD
dirOut=~/tiffsDDUncompressed

while IFS= read -d $'\0' -r file ; do
    # File basename 
    bName=$(basename -s .TIF "$file")
    
    # Output name
    outName=$bName.TIF
    
    # Full output paths
    fOut="$dirOut/$outName"
 
    # Convert to uncompressed TIFF
    convert  $file +compress $fOut

done < <(find $dirIn -type f -name "*.TIF" -print

Linux Mint 18.2 issues


"Failed to start the X server" message in login screen
Solution:
https://linuxnorth.wordpress.com/2017/07/04/installing-and-uninstalling-lightdm-in-linux-mint-18-2/


Top/title bar of windows missing, cannot move windows.
Solution: go to Preferences/Desktop Settings/Windows and select a Window Manager from the dropdown menu (for some reason no WM is selected by default).


Window resize margin in default Metacity window manager is only 1 px wide
Solution: https://askubuntu.com/questions/4109/how-do-i-increase-the-resize-margin-on-windows


28/09/2017

warcio


This library provides a fast, standalone way to read and write WARC Format commonly used in web archives.

https://github.com/webrecorder/warcio
25/09/2017

JWAT TOOLS

Includes ARC/WARC validation:
https://sbforge.org/display/JWAT/Running+JWAT-Tools
23/09/2017

Format Technology Lifecycle Analysis

https://tspace.library.utoronto.ca/bitstream/1807/75891/1/JASIST-format-technology-lifecycle-analysis.pdf
12/09/2017

Mimetypes of MS Office formats

https://technet.microsoft.com/en-us/library/ee309278(office.12).aspx
08/09/2017

Tika mimetype definitions

https://github.com/apache/tika/tree/master/tika-core/src/main/resources/org/apache/tika/mime
06/09/2017

Kaitai Struct


Kaitai Struct is a declarative language used for describe various binary data structures, laid out in files or in memory (...).
The main idea is that a particular format is described in Kaitai Struct language (.ksy file) and then can be compiled with ksc into source files in one of the supported programming languages. These modules will include  a generated code for a parser that can read described data structure from a file / stream and give access to it in a nice, easy-to-comprehend API.

http://kaitai.io/
29/08/2017

Suppress 'invalid-name' messages in Pylint output

Use -d option with invalid-name:
python3 -m pylint -d invalid-name boxvalidator.py > pylintjpylyzer.txt

24/08/2017

Zenodo: list all publications with "Digital Preservation" keyword in kbnl community

https://zenodo.org/communities/kbnl/search?page=1&size=20&q=keywords%253A%2522digital%2Bpreservation%2522
16/08/2017

JPEG 2000 drafts and freely available standards

https://github.com/Dzonatas/solution/tree/master/Documentation
15/08/2017

Remember Git login username/password

Following command will keep logibn credentials in cache for 1 hour:
git config --global credential.helper "cache --timeout=3600"

14/08/2017

Add path to LD_LIBRARY_PATH

For some reason I always forget this (below for OpenJPEG):
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib

Very large JP2 images


http://hirise-pds.lpl.arizona.edu/download/PDS/RDR/ESP/ORB_011200_011299/ESP_011265_1560/ESP_011265_1560_RED.JP2 (2GB)
6.7 GB http://apollo.sese.asu.edu/data/pancam/AS16/jp2/AS16-P-4102.jp2

10/08/2017

How GIT commit to an existing tag

https://gist.github.com/danielestevez/2044589
15/06/2017

How to use HTML and CSS for printing

http://css4.pub/
Prince tool:
http://www.princexml.com/
waeasyprint (OS alternative):
http://weasyprint.org/
29/05/2017

E-READ


The goal of this Action is to improve scientific understanding of the implications of digitization, hence helping individuals, disciplines, societies and sectors across Europe to cope optimally with the effects.

http://ereadcost.eu/
12/05/2017

Huge List Of Example Files – Creative Commons

http://blog.online-convert.com/huge-list-of-example-files-creative-commons/
10/05/2017

Copy directory tree with Robocopy

robocopy sourceDir destDir /COPYALL /E /R:0 /DCOPY:T

E.g.:
robocopy H:\iromlabTestKBDepotNew M:\DigitalPreservation\optischeDragers\iromlabTestKBDepot /COPYALL /E /R:0 /DCOPY:T >robocopy.stdout 2>robocopy.stderr

19/04/2017

reading ISO image of data session of multisession (e.g. enhanced audio) CDs

Some useful links:
Good description of the problem:
https://lists.debian.org/debian-user/2005/01/msg02339.html

the sector numbers in the file system refer to sectors of
the original CD rather than sectors of session2.iso. I don't know of a
utility for rewriting them so that the file can be loop-mounted or
written to an ordinary CD, but you can at least get a directory
listing by using isoinfo with an offset:
isoinfo -i session2.iso -N 204345 -l

https://lists.gnu.org/archive/html/libcdio-devel/2010-02/msg00048.html
Esp.:

Remember, the path table and directory structure of the iso reflect
the fact that the ISO filesystem starts on sector 222145 (49:23:70)
of the CD.  If it is burned to another CD at a different position,
it won't work.  Likewise, any program that reads the iso will need
to be able to compensate for the offset.  Try, for example:
isoinfo -N 222145 -d -i '8mm-songs_to_love_and_die_by.iso'

Also (from same thread):
https://lists.gnu.org/archive/html/libcdio-devel/2010-02/msg00053.html
06/04/2017

Ensure correct encoding when writing a text file in Python

Default encoding for read/write write depends on locale settings, which can result in unexpected behaviour. See e.g.:
http://stackoverflow.com/questions/43256079/decoding-of-bytes-object-results-in-unexpected-invalid-utf-8-how-can-i-avoid
Solution: always set the encoding explicitly when opening a file for read/write in text mode. Example:
# Byte sequence corresponds to multiplication sign in UTF-8
myBytes = b'\xc3\x97'
# Decode to string 
myString = myBytes.decode('utf-8')

# Write myString to file
with open("myString.txt", "w", encoding="utf-8") as ms_file:
    ms_file.write(myString)

03/04/2017

Create symbolic link on Windows

In this case, create link to f:\Pandoc\pandoc.exe in directory c:\bin:
mklink pandoc.exe F:\Pandoc\pandoc.exe

30/03/2017

How to Create a List of Your Installed Programs on Windows

https://www.howtogeek.com/165293/how-to-get-a-list-of-software-installed-on-your-pc-with-a-single-command/
Powershell method:
Get-ItemProperty HKLM:\Software\Wow6432Node\Microsoft\Windows\CurrentVersion\Uninstall\* | Select-Object DisplayName, DisplayVersion, Publisher, InstallDate | Format-Table –AutoSize > installedPrograms.txt

28/03/2017

Guidelines for Using PREMIS with METS for exchange

https://www.loc.gov/standards/premis/guidelines2017-premismets.pdf
24/03/2017

Extract text from Epub

Apache Tika

java -jar tika-app-1.14.jar -t whatever.epub > whatever.txt

BUT doesn't return chapters in reading order!!
Textract (Python)

https://github.com/deanmalmgren/textract
Installs with errors under Windows; seems to work OK on Linux.
23/03/2017

Build process for Windows binaries of file/libmagic under Linux

https://github.com/nscaife/file-windows
28/02/2017

Change bit depth of WAV file

Saves output file as 24 bits / channel:
ffmpeg -i frogs-01.wav -codec pcm_s24le frogs-01-24-bit.wav

For list of all codec values:
ffmpeg -codecs

07/02/2017

Python relative imports for the billionth time

http://stackoverflow.com/questions/14132789/relative-imports-for-the-billionth-time
27/01/2017

FFmpeg - Extract Blu-Ray Audio

https://wiki.gentoo.org/wiki/FFmpeg_-_Extract_Blu-Ray_Audio
19/01/2017

Accessing raw devices under Windows, command line

From:
https://support.microsoft.com/nl-nl/help/100027/info-direct-drive-access-under-win32

To open a physical hard drive for direct disk access (raw I/O) in a Win32-based application, use a device name of the form
\\.\PhysicalDriveN
where N is 0, 1, 2, and so forth, representing each of the physical drives in the system.
To open a logical drive, direct access is of the form
\\.\X:
where X: is a hard-drive partition letter, floppy disk drive, or CD-ROM drive.

E.g. compute checksum on CD in d: drive:
 md5sum \\.\D:

Accessing raw devices in Python (under Windows)

Access to logical drives:
http://stackoverflow.com/q/6522644/1209004
Write access:
http://stackoverflow.com/q/7135398/1209004
Reading raw disks with Python:
http://blog.lifeeth.in/2011/03/reading-raw-disks-with-python.html
Isoparser

https://github.com/barneygale/isoparser
15/01/2017

How to host your static site with HTTPS on GitHub Pages and CloudFlare

https://developer.ubuntu.com/en/blog/2016/02/17/how-host-your-static-site-https-github-pages-and-cloudflare/
BUT this will make accessing the site CAPTCHA hell for Tor users: https://support.cloudflare.com/hc/en-us/articles/203306930-Does-CloudFlare-block-Tor-
Alternatives:

CERTBot / Letsencrypt: requires server access
Github pages has built-in https support, but only for github.io domains.

11/01/2017

How to Host your Python Package on PyPI with GitHub

https://www.codementor.io/arpitbhayani/host-your-python-package-using-github-on-pypi-du107t7ku
One everything is set up, for each new release the basic steps are:

Update version number in main code
Update link to download_url (in my case this is automated)
Commit changes & push
Add tag: git tag -a x.y.z -m "whatever"
git push --tags
python setup.py register -r pypi
python setup.py sdist upload -r pypi

09/01/2017

CD/DVD Carrier checksums vs ISO image checksums


The md5sum of a "burnt" CD can be different than the md5sum of the associated iso file and not indicate an error

http://twiki.org/cgi-bin/view/Wikilearn/CdromMd5sumsAfterBurning
See also:
http://superuser.com/questions/220082/how-to-validate-a-dvd-against-an-iso
06/01/2017

Books and Literature Status Review 2016

https://warekennis.nl/wp-content/uploads/2013/03/BOOKS-AND-LITERATURE-STATUS-REVIEW-2017-.pdf
02/01/2017

Use ffmpeg / ffprobe to get tech properties from audio file

ffprobe track01.cdda.wav -show_format -show_streams > properties.txt

Result (file properties.txt):
[STREAM]
index=0
codec_name=pcm_s16le
codec_long_name=PCM signed 16-bit little-endian
profile=unknown
codec_type=audio
codec_time_base=1/44100
codec_tag_string=[1][0][0][0]
codec_tag=0x0001
sample_fmt=s16
sample_rate=44100
channels=2
channel_layout=unknown
bits_per_sample=16
id=N/A
r_frame_rate=0/0
avg_frame_rate=0/0
time_base=1/44100
start_pts=N/A
start_time=N/A
duration_ts=8233176
duration=186.693333
bit_rate=1411200
max_bit_rate=N/A
bits_per_raw_sample=N/A
nb_frames=N/A
nb_read_frames=N/A
nb_read_packets=N/A
DISPOSITION:default=0
DISPOSITION:dub=0
DISPOSITION:original=0
DISPOSITION:comment=0
DISPOSITION:lyrics=0
DISPOSITION:karaoke=0
DISPOSITION:forced=0
DISPOSITION:hearing_impaired=0
DISPOSITION:visual_impaired=0
DISPOSITION:clean_effects=0
DISPOSITION:attached_pic=0
DISPOSITION:timed_thumbnails=0
[/STREAM]
[FORMAT]
filename=track01.cdda.wav
nb_streams=1
nb_programs=0
format_name=wav
format_long_name=WAV / WAVE (Waveform Audio)
start_time=N/A
duration=186.693333
size=32932748
bit_rate=1411201
probe_score=99
[/FORMAT]

XML output:
ffprobe track01.cdda.wav -show_format -show_streams -print_format xml > properties.xml

01/01/2017

Update the Fritz!Box Mediaserver file index from a script

https://blog.heckel.xyz/2012/12/07/script-refresh-the-fritzmediaserver-dlna-index-of-the-fritzbox-6360-cable/
Script:
https://blog.heckel.xyz/wp-content/uploads/2012/12/fritzbox-dlna-refresh
19/12/2016

AMIA open workflows and resources for A/V archiving

https://github.com/amiaopensource/open-workflows
16/12/2016

NYPL Specifications for Audio and Moving Image Digitization

https://confluence.nypl.org/display/DIG/Specifications+for+Audio+and+Moving+Image+Digitization
07/12/2016

Mediags


Mediags is a console program that scans directories for media files and verifies the integrity of those files. Detailed content reports may optionally be produced.

https://mediags.codeplex.com/
(Binaries windows only)
01/12/2016

Browsers, not apps, are the future of mobile:

https://medium.com/swlh/browsers-not-apps-are-the-future-of-mobile-c552752ff75#.ilc1zlj1a
27/11/2016

Appear.in


Video conversations with up to 8 people for free.
No login required — no installs

https://appear.in/
31/10/2016

A guide to Wikidata, SPARQL, and WDQS

https://www.wikidata.org/wiki/User:TweetsFactsAndQueries/A_Guide_To_WDQS
28/10/2016

PDFx

Extract references and metadata from PDF documents, and download all referenced PDFs:
https://www.metachris.com/pdfx/
24/10/2016

Explanation of need for Multi Threading GUI programming

http://stackoverflow.com/questions/13343096/explanation-of-need-for-multi-threading-gui-programming
22/10/2016

Digital Open Access Identifier

http://doai.io/
19/10/2016

Wikidata:WikiProject Informatics/File formats

https://www.wikidata.org/wiki/Wikidata:WikiProject_Informatics/File_formats
14/10/2016

Python debugging tips

http://stackoverflow.com/questions/1623039/python-debugging-tips
30/09/2016

A Slow-Motion Revolution (history of the CD-ROM)

http://www.filfre.net/2016/09/a-slow-motion-revolution/
29/09/2016

An Open-Source Strategy for Documenting Events: The Case Study of the 42nd Canadian Federal Election on Twitter

http://journal.code4lib.org/articles/11358
25/09/2016

Check the Accessibility of a PDF Document (online)

http://checkers.eiii.eu/en/pdfcheck/
23/09/2016

Python event scheduler and queue modules

https://docs.python.org/3.6/library/queue.html
https://docs.python.org/3.6/library/sched.html
And perhaps:
https://docs.python.org/3.6/library/threading.html#module-threading
Possibly usable in CD imaging workflow (esp. interaction with operator input).
13/09/2016

media-autobuild_suite


This Windows Batchscript setups a MinGW/GCC compiler environment for building ffmpeg and other media tools under Windows. After building the environment it retrieves and compiles all tools. All tools get static compiled, no external .dlls needed (with some optional exceptions)

https://github.com/jb-alvarado/media-autobuild_suite
By default this doesn't build the ffmpeg optional libraries (incl. cddio). In order to build them, if the batch file prompts you to Choose ffmpeg and mpv optional libraries?, select option 4 (All available external libs). Alternatively (if you accidentally ran the build with the default option), open file media-autobuild_suite.ini and set the value of ffmpegChoice to 4:
ffmpegChoice=4

Libcdio windows binaries

http://lrn.no-ip.info/packages/i686-w64-mingw/libcdio/0.93-1/
12/09/2016

Cdrdao Windows binaries

http://www.student.tugraz.at/thomas.plank/
08/09/2016

Discid tool

http://discid.sourceforge.net/
Tried flactag fork, which gives following output for CD-ROM:
Query failed: no actual audio tracks on disc: CDROM or DVD?

So might be useful for distinguishing between audio CD's and CD-ROMs (tarball contains Windows binary).
disktype tool

http://disktype.sourceforge.net/
Output audio CD:
Block device, size 690.4 MiB (723972096 bytes)
CD-ROM, 14 tracks, CDDB disk ID D912690E
Track 1: Audio track, 37.35 MiB (39163152 bytes),   3 min 42 sec
Track 2: Audio track, 87.89 MiB (92163120 bytes),   8 min 42 sec 
::
Track 13: Audio track, 37.22 MiB (39029088 bytes),   3 min 41 sec
Track 14: Audio track, 78.14 MiB (81931920 bytes),   7 min 44 sec

CD-ROM:
Block device, size 223.2 MiB (233990144 bytes)
CD-ROM, 1 track, CDDB disk ID 0205F301
Track 1: Data track, 223.2 MiB (233994240 bytes)
  ISO9660 file system
    Volume name "0305132335"
    Preparer    "CEQUADRAT 32BIT ISO-9660 FORMATTER COPYRIGHT (C) 1995-1998 BY CEQUDRAT GMBH"
    Data size 222.9 MiB (233682944 bytes, 114103 blocks of 2 KiB)
    Joliet extension, volume name "0305132335"

Enhanced audio CD:
Block device, size 223.2 MiB (233990144 bytes)
CD-ROM, 22 tracks, CDDB disk ID 4B113416
Track 1: Audio track, 9.627 MiB (10094784 bytes),   0 min 57 sec
Track 2: Audio track, 30.01 MiB (31462704 bytes),   2 min 58 sec
::
Track 20: Audio track, 41.33 MiB (43340304 bytes),   4 min 05 sec
Track 21: Audio track, 47.73 MiB (50048208 bytes),   4 min 43 sec
Track 22: Data track, 90.84 MiB (95252480 bytes)

DVD:
Block device, size 223.2 MiB (233990144 bytes)
CD-ROM, 1 track, CDDB disk ID 023BFD01
Track 1: Data track, 2.197 GiB (2358986752 bytes)
  Apple partition map, 2 entries
  Partition 1: 31.50 KiB (32256 bytes, 63 sectors from 1)
    Type "Apple_partition_map"
  Partition 2: 2.737 GiB (2938324992 bytes, 5738916 sectors from 1108)
    Type "Apple_HFS"
    HFS Plus file system
      Volume size 2.737 GiB (2938324992 bytes, 1434729 blocks of 2 KiB)
      Volume name "BelPop Marc Moulin"
  UDF file system
    Sector size 2048 bytes
    Volume name "BelPop Marc Moulin"
    UDF version 1.50
  ISO9660 file system
    Volume name "BELPOPMARCMOULIN"
    Data size 2.737 GiB (2938894336 bytes, 1435007 blocks of 2 KiB)
    Joliet extension, volume name "BelPop Marc Moul"

(note DVD is identified as CD-ROM; doesn't realy matter as extraction fronm DVD is identical to data CD-ROM).
Compiles without problems under Windows (using Cygwin), but doesn't seem to be able to access cd-devices. E.g.:
disktype /dev/sr0

Result:
--- /dev/sr0
Block device, size 332.6 MiB (348790784 bytes)
disktype: Data read failed at position 0: Invalid request code

Or:
disktype D:\

Result:
--- D:\
disktype: D:\: Is a directory

Or:
disktype D

Result:
--- D
disktype: Can't stat D: No such file or directory

Perhaps try cdrdao scanbus?
07/09/2016

WMI queries from the command line (Windows)

http://www.robvanderwoude.com/wmic.php
Example - get information about optical drives:
wmic cdrom  where mediatype!='unknown' get > test.txt

06/09/2016

Libcdio & pycdio


The GNU Compact Disc Input and Control library (libcdio) contains a library for CD-ROM and CD image access. Applications wishing to be oblivious of the OS- and device-dependent properties of a CD-ROM or of the specific details of various CD-image formats may benefit from using this library.

http://www.gnu.org/software/libcdio/
Python interface:
https://pypi.python.org/pypi/pycdio/
01/09/2016

Python data entry form example

http://codereview.stackexchange.com/questions/52397/a-general-purpose-gui-data-input-with-validation-but-unclear-about-best-object
31/08/2016

Imaging and image format for mixed mode CDs

Brown, "Developing Virtual CD-ROM Collections" (2012):
http://www.ijdc.net/index.php/ijdc/article/view/216/285
Page 13:


Create BIN/TOC file with cdrdao using:
cdrdao read-cd --read-raw --device 1,0,0 --datafile allmy.bin allmy.toc


Author developed SheepShaver extension that allows these images to be read by emulator


Caveats:

The given cdrdao command only extracts one session (I guess the Voyager CD-ROMs only contain one session with both the data and audio tracks, although the paper isn't entirely clear about this).
In case of a CD with multiple sessions one would have to repeat the command for each of those (result: one separate image for each session)
Hybrid CD-ROMs are not supported by any of the most widely-used emulators (also stressed by author)

Jackson (BL):
http://anjackson.net/keeping-codes/practice/developing-a-robust-migration-workflow-for-preserving-and-curating-handheld-media.html
On multisession carriers:

While CD-ROM, DVD and HFS+ format disks are reasonably well covered by this approach, there are some important limitations. For example, the optical media formats all support the notion of ‘sessions’ – consecutive additions of tracks to a disk. This means that a given carrier may contain a ‘history’ of different versions of the data. By choosing to extract a single disk image, we only expose the final version of the data track, and any earlier versions, sessions or tracks are ignored. For our purposes, these sessions are not significant, but this may not be true elsewhere.

BUT sessions (at least on commercially manufactured carriers) typically don't contain different versions of the same data, but data that are completely different! Example: many 'enhanced' audio CDs that contain one session with all audio tracks, and another session with a data track. So sessions are significant!
BL workflow for REd Book (audio) and Yellow Book (mixed mode) carriers:

Image to MDS/MDF format
Then post-process MDS/MDF file with IsoBuster

But it's not entirely clear if the MDS/MDF can handle multisession carriers?
I found this in the Knowledge Base of the developer of the format:
http://support.alcohol-soft.com/en/knowledgebase.php?postid=15034&title=Restrictions+for+creating+image+files

Image making wizard will always allow the user to create mds/mdf ccd/img/sub.


But ISO format, only for those disc's that contain 1 data track(mode1 or mode2form1).


For cue/bin only for one session disc. if the original disc is a multi-session one, then the cue/bin would not be available and If the user chooses read sub-channel, the cue/bin and iso would be unavailable as well . because iso and cue/bin could not save sub channel data.

So apparently MDS/MDF does support multisession after all!
Good overview of disc image formats here:
http://www.theisozone.com/blogs/homebrew/burning-image-file-type-explained/
23/08/2016

Sheepshaver (Macintosh emulator)

Includes links to ROM and startup images:
http://www.redundantrobot.com/#/sheepshaver
Preserving and Emulating Digital Art Objects

Report by Cornell University:
https://ecommons.cornell.edu/handle/1813/41368
CD-ROM FAQ

Some useful info on Mac / PC images and hybrids:
http://www.macdisk.com/faqcden.php
22/08/2016

CDRWIN manual

Contains lots of info on optical carrier and disc image formats (e.g. BIN/CUE):
http://web.archive.org/web/20070221154246/http://www.goldenhawk.com/download/cdrwin.pdf
18/08/2016

Python requests fetch a file from a local url

http://stackoverflow.com/questions/10123929/python-requests-fetch-a-file-from-a-local-url
17/08/2016

Computer Display Calibration 101

https://blog.codinghorror.com/computer-display-calibration-101/
Bias Lighting

https://blog.codinghorror.com/bias-lighting/
16/08/2016

Recursively find/count files with specific extension

Find all files with .pdf extension:
find . -type f -name '*.pdf'

Count all files with .pdf extension:
find . -type f -name '*.pdf'| wc -l

PyRomInfo

Esp. 'useful links' section:
https://github.com/garbear/pyrominfo
21/07/2016

One pixel is worth three thousand words

Representation of 1 pixel in many different formats:
http://cloudinary.com/blog/one_pixel_is_worth_three_thousand_words
20/07/2016

The Programming Historian

Online tutorials on APIs, Data Management, Data Manipulation, Distant Reading, Linked Open Data, Mapping and GIS, Network Analysis, Omeka Exhibit Building, Web Scraping and Programming with Python:
http://programminghistorian.org/lessons/
14/07/2016

Writerperfect library

Supports lots of (old) Office-related formats + includes many conversion tools:
https://launchpad.net/ubuntu/+source/writerperfect/0.9.5-1
06/07/2016

Horrifying PDF experiments

https://github.com/osnr/horrifying-pdf-experiments
05/07/2016

Python classes simple examples

https://en.wikibooks.org/wiki/A_Beginner%27s_Python_Tutorial/Classes
26/06/2016

How To Install Linux Mint to SSD and HHD /home

https://forums.linuxmint.com/viewtopic.php?t=177915
23/06/2016

Python metadata libraries


METS
MODS
PREMIS

(Source: Nick Krabbenhöft on Twitter)
22/06/2016

Library of Congress Audio Compact Disc METS Profile

http://www.loc.gov/standards/mets/profiles/00000007.html
Creating Virtual CD-ROM Collections

http://dx.doi.org/10.2218/ijdc.v4i2.107
From Imaging to Access - Effective Preservation of Legacy Re-movable Media

http://www.digpres.com/publications/woodsbrownarch09.pdf
Example METS file (note that apparently they combine multiple ISOs in one AIP):
http://webapp1.dlib.indiana.edu/virtual_disk_library/index.cgi/4252478/mets
BL METS profile - Sound Recordings 2

http://www.bl.uk/profiles/sound/METS_profile.pdf
19/06/2016

Linux File System Hierarchy

https://www.blackmoreops.com/2015/06/18/linux-file-system-hierarchy-v2-0/
Digital Dark Age Klaxon

https://youtu.be/a_6CZ2JaEuc
17/06/2016

SIP creator tools


Delving SIP-Creator
Fedora SIP Creator
UGent Sip Creator
SIP-Builder
RODA-In
Dvcapture
DURAARK SIP generator

14/06/2016

Characterisation of CD-ROMs


Characterization of CDROMs for Emulation-based Access. Klaus Rechert, Thomas Liebetraut, Oleg Stobbe, Isgandar Valizada and Tobias Steinke (presentation)


Characterization of CD-ROMs for Emulation-based Access (paper)


31/05/2016

Validate XML against user-defined XSD schema

xmllint --noout -schema schema.xsd whatever.xml

27/05/2016

Recursively compute md5 checksums for all files in directory tree

find -type f -exec md5sum "{}" + > checksums.md5

Source: http://askubuntu.com/a/318534. Works also under Cygwin.
Issue: output also includes MD5 sum of output file (which become invalid once anything is written to the file).
23/05/2016

Generate new access JP2 from master


Convert master JP2 to TIFF using Kakadu (this preserves any embedded ICC profiles):

kdu_expand -i master.jp2 -o master.tiff


Convert TIFF to lossy JP2 with Aware via jpwrappa:

jpwrappa -m -p C:\jpwrappa\profiles\optionsKBAccessLossy_2014.xml master.tiff access.jp2


(The -m switch can be omitted, in which case there is no need for Exiftool.)
19/05/2016

Disc robots


Acronova Nimbie USB Plus range
Nimbie NB21-DVD
Nimbie USB range (NB11 - not available (19/5))

18/05/2016

Digital newspapers


Guidelines for Digital Newspaper Preservation
Chronicles in Preservation: Preserving Digital News and Newspapers
Digital Preservation of Newspapers: Findings of the Chronicles in Preservation Project
E-paper Production Workflow – Adapting Production Workflow Processes for Digital Newsprint
PRESERVING NEWS IN THE DIGITAL ENVIRONMENT: MAPPING THE NEWSPAPER INDUSTRY IN TRANSITION

10/05/2016

Use docx document as template in Pandoc

Use the --reference-docx switch:
pandoc -S --reference-docx=template.docx test.md -o test.docx 

26/04/2016

Rollback git repo to previous state + push changes to remote

Rollback to previous state:
git reset --hard <tag/branch/commit id>

Commit changes:
git push ... -f

Example:
git reset --hard 2dbe067c1674dcf9a23104c4b64b772e1550ba29
git push origin master -f

Mimetype Comparison DROID, Tika, File, April 2016

http://162.242.228.174/mimes/mime_comparisons.html
Common Crawl


An open repository of web crawl data that can be accessed and analyzed by anyone

https://commoncrawl.org/
25/04/2016

Tika-python


A Python port of the Apache Tika library that makes Tika available using the Tika REST Server.

https://github.com/chrismattmann/tika-python
Manipulating PDFs with Python

https://www.binpress.com/tutorial/manipulating-pdfs-with-python/167
22/04/2016

Introduction to the Bash Command Line (The Programming Historian)

http://programminghistorian.org/lessons/intro-to-bash
20/04/2016

Python wrapper for EpubCheck

https://github.com/titusz/epubcheck
Tegelspreukmaker

http://www.tegelspreukmaker.nl/
14/04/2016

Sozi presentation software

Looks a bit similar to Prezi, but OS (presentation as SVG):
http://sozi.baierouge.fr/
06/04/2016

Android screen rotate in VirtualBox

Press F9, F10, F11 or F12 twice. "Auto-rotate screen" option in Android Settings must be enabled.
04/04/2016

HTML codeblock hell in Wordpress

Following codeblock is not rendered correctly in Wordpress:
<pre><code>&lt;div&gt;test&lt;/div&gt;</code></pre>

Workaround is to replace forward slash in closing tag by entity reference:
<pre><code>&lt;div&gt;test&lt;&#47;div&gt;</code></pre>

29/03/2016

Caradoc - a PDF parser and validator

https://github.com/ANSSI-FR/caradoc
Note: current Debian package of Opam not recent enough, so used the instructions under "Binary distribution" at https://opam.ocaml.org/doc/Install.html. Installs binary in /usr/local/bin.
Make file initially didn't work because ocamlfind could not be found. Fixed by typing:
eval $(opam config env)

After this it compiles without any errors.
24/03/2016

Seeing the Double Rainbow: The Trials and Tribulations Working with Optical Media

Includes MiniDisc:
http://ndsr.nycdigital.org/seeing-the-double-rainbow-the-trials-and-tribulations-working-with-optical-media/
15/03/2016

Ebooklib

Python library that reads/writes EPUB, including EPUB 3:
https://github.com/aerkalov/ebooklib
Example, create EPUB from HTML:
https://gist.github.com/bitsgalore/4c830a301f33f584c041
CB infographics e-books in Nederland

http://www.cb.nl/nieuws/alle-relevante-data-over-e-books-in-nederland/
http://www.cb.nl/nieuws/e-bookbarometeblijft-groeien/
14/03/2016

Encyclopedia of Graphics File Formats

http://fileformats.archiveteam.org/wiki/Encyclopedia_of_Graphics_File_Formats
HTML5 is the New Flash

http://homepages.cwi.nl/~steven/Talks/2015/11-06-xml-amsterdam/
05/03/2016

Excel to XML: How to Transfer Your Spreadsheet Data Onto an XML File

This works (but what's referred to as a "schema" isn't really a schema at all):
https://blog.udemy.com/excel-to-xml/
How To Export an Excel 2010 Worksheet to XML

Similar to above, but uses XSD Schema directly, might be better:
https://bitwizards.com/blog/november-2010/how-to-export-an-excel-2010-worksheet-to-xml
23/02/2016

Reference rot in scholarly articles

Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot
Playback WARC

Web archive player:
https://github.com/ikreymer/webarchiveplayer
20/02/2016

Search and replace string for all files in directory tree

E.g. replace every occurrence of /tmp/"$fileIn" with /tmp/"$(cat /dev/urandom | tr -cd 'a-f0-9' | head -c 16)":
find /home/johan/cajascripts -type f -print0 | xargs -0 sed -i 's/\/tmp\/"$fileIn"/\/tmp\/"$(cat \/dev\/urandom | tr -cd 'a-f0-9' | head -c 16)"/g'

18/02/2016

Save blog with archiveBot


Don't save offsite links
Use 'blogs' ignore pattern

Command (I think?):
!archive http://www.flipvandyke.nl/ --no-offsite-links --ignore-sets=blogs

28/01/2016

Recovering data from broken disk under Ubuntu

https://help.ubuntu.com/community/DataRecovery
14/01/2016

Links to freely available EPUB files with DRM


http://www.mobileread.mobi/forums/attachment.php?s=73fdccae770ee8b13fda5f5916e55eae&attachmentid=137483&d=1429591966

07/01/2016

Determine actual compression ratio of each quality layer in JP2

If N = number of layers, then first extract layers i and below to a separate JP2 with Aware j2kdriver tool:
j2kdriver -i foo.jp2 -ql (N-i+1) -t JP2 -o foo_i.jp2

Then use jpylyzer to compute the compression ratio of resulting image.
Example - input image with 11 quality layers

Create derived image for each quality layer:
j2kdriver -i MMAD01_000001001_00011_master.jp2 -ql 11 -t JP2 -o layer1.jp2
j2kdriver -i MMAD01_000001001_00011_master.jp2 -ql 10 -t JP2 -o layer2.jp2
::
::
j2kdriver -i MMAD01_000001001_00011_master.jp2 -ql 1 -t JP2 -o layer11.jp2

09/12/2015

Change last modified date of file

touch -d "1 January 1768" myfile.txt

30/11/2015

Stop laptop from re-booting after shutdown

This happened to my HP ProBook 640 G1. Workaround: in BIOS, disable "wake on LAN". Source: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1470723/comments/13
24/11/2015

Comparison of CD rippers

http://wiki.hydrogenaud.io/index.php?title=Comparison_of_CD_rippers
10/11/2015

Convert Word document to PDF from command line

http://superuser.com/questions/789968/windows-7-batch-command-line-to-save-as-pdf-file-for-word-2013-docx-file
06/11/2015

Beeld & Geluid Preservation Metadata Dictionary

http://publications.beeldengeluid.nl/pub/84
05/11/2015

Yale Library Digital Preservation System Requirements

http://web.library.yale.edu/sites/default/files/files/YULDPSHighLevelRequirementsUseCasesDiagrams.pdf
19/10/2015

Best Way To Merge A (GitHub) Pull Request

http://blog.differential.com/best-way-to-merge-a-github-pull-request/
Third option (Catch Feature Up with Master by Rebasing, then fast-forward Merge).
16/10/2015

Handboek informaticavaardigheden UvA

http://liv.science.uva.nl/index.html
Misschien delen (her)bruikbaar voor interne cursussen e.d.
10/10/2015

Add right-click context menu items in Ubuntu /Linux Mint

Ubuntu with Nautilus file manager - Nautilus Actions:
http://www.pcsteps.com/4434-add-right-click-commands-linux-mint-ubuntu/
Linux Mint Cinnamon with Nemo file manager:
http://www.pcsteps.com/4434-add-right-click-commands-linux-mint-ubuntu/
Linux Mint Mate with Caja file manager:
http://www.ethanjoachimeldridge.info/tech-blog/caja-exifstrip-context-action
10/09/2015

Create floppy image from arbitrary files

From http://stackoverflow.com/a/11202773:
Suppose I want to create a floppy image containing file oakcdrom.sys:
dd bs=512 count=2880 if=/dev/zero of=oakcd.img
mkfs.msdos oakcd.img
mcopy -i oakcd.img oakcdrom.sys ::/

Inspect contents:
mdir -i oakcd.img

27/08/2015

Create image of 3.5" DOS / Windows floppy

General command:
ddrescue -d -n -b 512 /dev/fd0 myfloppy.img myfloppy.log 

To get name of device:
lsblk

Result:
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0 465,8G  0 disk 
├─sda1   8:1    0 457,9G  0 part /
├─sda2   8:2    0     1K  0 part 
└─sda5   8:5    0   7,9G  0 part [SWAP]
sdb      8:16   0  29,8G  0 disk 
sdc      8:32   1   1,4M  1 disk

So in this case it is /dev/sdc. Create the image with:
sudo ddrescue -d -n -b 512 /dev/sdc myfloppy.img myfloppy.log

Optionaly use dosfsck tool to check the integrity of the file system (assuming it is a DOS file system). Use following command:
echo "n" |dosfsck -t -r myfloppy.img

The -t option checks for bad clusters, but this only works in combination with -a (automatically repair) or -r (interactively repair). So to do the check without automatic repair or input from user we use -r and then use a pipe to prevent any changes being made. Result:
fsck.fat 3.0.26 (2014-03-07)
Cluster 2845 is unreadable.
Cluster 2846 is unreadable.
Cluster 2847 is unreadable.
Cluster 2848 is unreadable.
Perform changes ? (y/n) myfloppy.img: 33 files, 2304/2847 clusters

Git as synchronisation tool links

Check integrity of git rpo:
http://stackoverflow.com/questions/5585388/which-git-commands-perform-integrity-checks
(Bottom line: use git fsck.)
How to shrink the git folder:
http://stackoverflow.com/questions/5613345/how-to-shrink-the-git-folder
25/05/2015

Exiting and re-entering GUI in Linux Mint

Exit GUI:
 Ctrl-Alt-F1

Re-enter:
Ctrl-Alt-F8

18/05/2015

Make Markdown preview in ReText work

From https://bugs.launchpad.net/ubuntu/+source/retext/+bug/1451125:
sudo apt-get install python3-docutils python3-markdown

17/05/2015

Entering BIOS of HP EliteBook 840

From the manual:

Turn on or restart the computer, and then press esc while the “Press the ESC key for Startup Menu” message is displayed at the bottom of the screen
Press f10 to enter Computer Setup.

Check hard disk for bad sectors/blocks

sudo badblocks -sv /dev/sda1

See also:http://askubuntu.com/questions/59064/how-to-run-a-checkdisk
14/04/2015

Location of Virtual Box Guest additions on Linux host machine

/usr/share/virtualbox

23/03/2015

How to get rid of clock skew errors while building packages on VM

Run this on host machine:
sudo ntpdate ntp.xs4all.nl

Then re-start VM; host and guest are now in sync and no more clock skew errors.
17/03/2015

Markdown to HTML (with smart quotes) in Pandoc

pandoc -S whatever.md -o whatever.html
12/03/2015

Validating code lists with Schematron

http://broadcast.oreilly.com/2008/11/validating-code-lists-with-sch.html
02/03/2015

Character sets

Handige Unicode en UTF-8 achtergrondinfo:
http://codesnippets.wpakb.kb.nl/index.php?title=Character_sets
17/02/2015

EPUB creation tool

Sigil:
https://github.com/user-none/Sigil
Simple, use-friendly.
04/02/2015

ISO Image creation

ddrescue:
http://www.gnu.org/software/ddrescue/manual/ddrescue_manual.html
Command line (Cygwin):
ddrescue -b 2048 -v /dev/scd0 test.iso test.log
Info on image

disktype tool:
http://disktype.sourceforge.net/
E.g. reveals file system tyype (ISO/UDF), other tech info.
22/01/2015

Installing Windows 98 in VirtualBox

General instructions here:
http://www.msfn.org/board/topic/170785-virtualbox-windows-98se-step-by-step/
But results in error:
HID failed to attach mouse driver (VERR_PDM_NO_ATTACHED_DRIVER
Tried this:
https://forums.virtualbox.org/viewtopic.php?f=2&t=58657#p272752
VBoxInternal/USB/HidMouse/1/Config/CoordShift 0
Still doesn't work; neither does:
VBoxInternal/USB/HidMouse/1/Config/CoordShift 1
But see:
https://www.virtualbox.org/manual/ch12.html#idp60139152
Installing Windows 2000 in VirtualBox

Windows 2000 installation failures:
https://www.virtualbox.org/manual/ch12.html#idp60119680
Works!
Then go install guest additions:
https://docs.oracle.com/cd/E36500_01/E36502/html/qs-guest-additions.html
09/12/2014

AsciiMath

"AsciiMath is an easy-to-write markup language for mathematics":
http://asciimath.org/
03/12/2014

Git cheat sheet

Add all files in directory tree to the index (an remove deleted ones)

git add -A
Commit

git commit -m "Changed everything"
Push to master

git push origin master
Push to some other repo (provided I have the rights for this)

git push git@github.com:openplanets/jpylyzer-test-files.git master
Versioning / tagging

Versioning: x.y.z
x: API breakage y: new feature z: bugfix
Add tag

git tag -a 1.1.0 -m "tagging vesion 1.1.1 with refactored code"
Push tags

git push --tags
02/12/2014

Create test dataset according to new KB digitisation specs from old JP2 batch

1. Convert all master JP2s to TIFF with ImageMagick, using the command:
mogrify -format tiff *.jp2
2. Conversion loses resolution info (see below), so add new values
using:
exiftool *.tiff -xresolution=300 -yresolution=300 -resolutionunit=inches
3. Convert TIFFs to master JP2s:
f:\johan\pythoncode\jpwrappa\jpwrappa\jpwrappa.py M:\Trans\johan\testJP2ContrApp2014\B5\tiff\*.tiff M:\Trans\johan\testJP2ContrApp2014\B5\jp2k\master\ -p F:\johan\pythonCode\jpwrappa\jpwrappa\profiles\optionsKBMasterLossless_2014.xml -m
4. Same for access JP2s:
f:\johan\pythoncode\jpwrappa\jpwrappa\jpwrappa.py M:\Trans\johan\testJP2ContrApp2014\B5\tiff\*.tiff M:\Trans\johan\testJP2ContrApp2014\B5\jp2k\access\ -p F:\johan\pythonCode\jpwrappa\jpwrappa\profiles\optionsKBAccessLossy_2014.xml -m
But ... looking at image header box:
<imageHeaderBox> <height>2818</height> <width>1913</width> <nC>1</nC> <bPCSign>unsigned</bPCSign> <bPCDepth>8</bPCDepth> <c>jpeg2000</c> <unkC>yes</unkC> <iPR>no</iPR> </imageHeaderBox>
So "unknown colourspace" is set to "yes", which should be no (and it is "No" in the source JP2). So what is causing this? Bug in Aware software? Does this only happen with Grayscale images?
Aware codec produces JP2s that are not valid if TIFF doesn't contain resolution info

To reproduce the problem:

Convert any JP2 to TIFF with ImageMagick (will strip away any
resolution info)
Convert TIFF to JP2 with Aware.

Run jpylyzer on resulting JP2:
<isValidJP2>False</isValidJP2> <tests> <jp2HeaderBox> <resolutionBox> <captureResolutionBox> <hRcNIsValid>False</hRcNIsValid> </captureResolutionBox> </resolutionBox> </jp2HeaderBox> </tests>
Looking at properties of resolution box:
<resolutionBox> <captureResolutionBox> <vRcN>29491</vRcN> <vRcD>7491</vRcD> <hRcN>0</hRcN> <hRcD>1</hRcD> <vRcE>1</vRcE> <hRcE>4</hRcE> <vRescInPixelsPerMeter>39.37</vRescInPixelsPerMeter> <hRescInPixelsPerMeter>0.0</hRescInPixelsPerMeter> <vRescInPixelsPerInch>1.0</vRescInPixelsPerInch> <hRescInPixelsPerInch>0.0</hRescInPixelsPerInch> </captureResolutionBox> </resolutionBox>
25/11/2014

Encodings and writing to file (Unicode)

Here for UTF-8:
http://stackoverflow.com/a/9822937
20/11/2014

Jpylyzer Ubuntu / Debian links


https://answers.launchpad.net/ubuntu/+source/jpylyzer/+question/257977

Clone specific branch of Github repo

git clone https://github.com/openpreserve/jpylyzer.git --branch gh-pages --single-branch ./jpylyzerHomepage
7/11/2014

Refs to external macros in Excel workbook

File:
E:\\laPeyneCDROM\\xlsfiles\\series98.xls

Refs to MACROS.XLS'!ENash, which is missing.
Solution: before opening, disable automatic workbook calculation from options:

Loading spreadsheet now results in most recent values that are stored in workbook.
27/10/2014

Google search by file extension

thermo filetype:tdb
Only gives results with extension tdb.
16/10/2014

CD imaging


An Introduction to Optical Media Preservation:
http://journal.code4lib.org/articles/9581


What are the best CD/DVD-ROM drives for disc imaging?
http://qanda.digipres.org/10/what-are-the-best-cd-dvd-rom-drives-for-disc-imaging?show=10#q10


CD/DVD Drive Accuracy List 2014:
http://forum.dbpoweramp.com/showthread.php?34019-CD-DVD-Drive-Accuracy-List-2014


Preserving Write-Once DVDs: Producing Disk Images, Extracting
Content, and Addressing Flaws and Errors (LoC):
http://preservationmatters.blogspot.nl/2015/01/preserving-write-once-dvds.html


Developing a Robust Migration Workflow for Preserving and Curating Hand-held Media (Andy Jackson):
http://anjackson.net/keeping-codes/practice/developing-a-robust-migration-workflow-for-preserving-and-curating-handheld-media.html


09/10/2014

Publisher data formats

https://spotdocs.scholarsportal.info/display/EJournals/Publisher+Data+Formats
06/10/2014

EPUBCHECK validation errors/warnings

Both errors and warnings reported to same _message_ element in XML.
E.g. compare:
  <status>Not well-formed</status>
  <messages>
   <message>ERROR: /OEBPS/cover.html(5): non-standard stylesheet resource 'OEBPS/page-template.xpgt' of type 'application/vnd.adobe-page-template+xml'. A fallback must be specified.</message>
   <message>ERROR: /OEBPS/copyright.html(5): non-standard stylesheet resource 'OEBPS/page-template.xpgt' of type 'application/vnd.adobe-page-template+xml'. A fallback must be specified.</message>
   </messages>

with this:
  <status>Well-formed</status>
  <messages>
   <message>WARN: /OEBPS/toc.ncx: meta@dtb:uid content 'null' should conform to unique-identifier in content.opf: '821'</message>
  </messages>

So output needs some parsing. Tested w. epubcheck 3.0.1.
02/10/2014

Externe schijven Windows PC


E drive: Hitachi (grote drive)
H drive: Buffalo (kleine drive)

H gebruikt als backupdisk van E.
26/09/2014

Jpylyzer poster DPC / 4C

17/18 november, poster gecanceld, wel 90 s praatje + 1 slide.
10/09/2014

Jpylyzer users & links

BnF:
http://www.bnf.fr/documents/ref_num_fichier_image.pdf
04/09/2014

Ebook vs paper

Readers absorb less on Kindles than on paper, study finds:
http://www.theguardian.com/books/2014/aug/19/readers-absorb-less-kindles-paper-study-plot-ereader-digitisation
Reading and learning from screens versus print: a study in changing
habits: Part 1 – reading long information rich texts:
http://www.emeraldinsight.com/doi/full/10.1108/NLW-01-2013-0012
http://www.scientificamerican.com/article/reading-paper-screens/
21/08/2014

Syncing a fork in Github

https://help.github.com/articles/syncing-a-fork
Requires:
https://help.github.com/articles/configuring-a-remote-for-a-fork
27/02/2014

Useful Python shit


PEP8
pyflakes
pdb: http://stackoverflow.com/a/1623085/1209004

10/02/2014

Create PDF from multiple TIFFS

GraphicsMagick command line:
gm convert -compress jpeg -quality 50 *.TIF test.pdf
Result: PDF with all images as JPEG, quality 50. According to Acrobat / Apache Preflight the PDF has some format conformance issues. One possible remedy is to re-process the PDF using Ghostscript. E.g. command
below produces a PDF that conforms to PDF/A-1b::
gswin64 -dPDFA -dBATCH -dNOPAUSE -dUseCIEColor -sProcessColorModel=DeviceCMYK -sDEVICE=pdfwrite -sPDFACompatibilityPolicy=1 -sOutputFile=test_a.pdf test.pdf
Source:
http://stackoverflow.com/questions/1659147/how-to-use-ghostscript-to-convert-pdf-to-pdf-a-or-pdf-x
05/02/2014

2013: 74% of Dutch e-books distributed without DRM

http://www.cb-logistics.nl/wp-content/uploads/2013/01/74-percent-of-Dutch-e-books-distributed-without-DRM.pdf
04/02/2014

Unix Commands and Batch Processing for the Reluctant Librarian or Archivist

Link: http://journal.code4lib.org/articles/9158
03/02/2014

How to estimate JPEG Quality

Tutorial:
http://fotoforensics.com/tutorial-estq.php
But ... this is also possible with ImageMagick / GraphicsMagick (according to Approximate Quantization Table method that is mentioned in the tutorial):
http://superuser.com/questions/62730/how-to-find-the-jpg-quality