Experimental attempt at getting organized ...
https://dev.to/codemouse92/introducing-dead-simple-python-563o
If loading sounds from .prg files gives unexpected results: check that Midi channel is set to 1 before launching the editor! Reportedly MIDI clock needs to be set to external as well.
In this post, I will illustrate the various concepts underlying regex. The goal is to help you build a good mental model of how a regex pattern works.
[A] general list of applications sorted by category, as a reference for those looking for packages. Many sections are split between console and graphical applications.
https://wiki.archlinux.org/index.php/List_of_applications
https://superuser.com/questions/1101851/how-to-move-var-www-html-folder-to-external-hdd/1101856
Also:
https://askubuntu.com/questions/1220778/how-can-web-server-access-external-hdd
Thorium Reader is an easy to use EPUB reading application for Windows 10/10S, MacOS and Linux.
https://github.com/edrlab/thorium-reader/releases
This seems to work:
RedirectMatch ^(.*)/$ $1/home.htm
In View menu, open routing matrix and click on system:midi midi playback2 (needs to be enabled first from Preferences). Routing is set for each track.
https://askubuntu.com/questions/819939/virtualbox-fails-after-kernel-update
An exhaustive reference to problems seen in real-world data along with suggestions on how to resolve them.
https://github.com/Quartz/bad-data-guide
https://filingdb.com/b/pdf-text-extraction
https://www.nature.com/articles/d41586-020-02610-z
ftfy fixes Unicode that's broken in various ways.
https://github.com/LuminosoInsight/python-ftfy
https://www.qgis.org/en/site/forusers/alldownloads.html#flatpak
https://www.tecmint.com/keep-remote-ssh-sessions-running-after-disconnection/
Steps:
screen
Then issue commands. Then press Ctrl-a followed by d to detach. Log out.
systemd-analyze time
Result (in this case there's some odd firmware delay):
Startup finished in 1min 55.160s (firmware) + 10.965s (loader) + 3.955s (kernel) + 10.002s (userspace) = 2min 20.085s
graphical.target reached after 9.996s in userspace
Detailed breakdown:
systemd-analyze blame
Result:
7.416s NetworkManager-wait-online.service
1.966s vboxdrv.service
827ms apt-daily-upgrade.service
558ms systemd-fsck@dev-disk-by\x2duuid-9224\x2d4AC1.service
500ms dev-sdb1.device
477ms systemd-journal-flush.service
:: ::
Here, split into 500,000-line files:
split -l 500000 -d 2019-05-21_all_domains_NL.txt domains-nl
https://www.reddit.com/r/linux/comments/if1krd/how_to_delete_all_your_files/
qpdf --check --verbose whatever.pdf
pdfinfo whatever.pdf
Or (forces reading of all text):
pdftotext whatever.pdf
jhove -m PDF-hul -i whatever.pdf
gs -dNOPAUSE -dBATCH -sDEVICE=nullpage whatever.pdf
Using PDFDebugger (activates GUI-type browser):
java -jar ~/pdfbox/pdfbox-app-2.0.21.jar PDFDebugger whatever.pdf
mutool info whatever.pdf
verapdf whatever.pdf
(Or use GUI).
pdfcpu validate whatever.pdf
Note to self: installed this by copying the Linux binary to ~/.local/bin/
(doesn't require GoLang).
Compare text (verbose output):
comparepdf ct -v=2 whatever.pdf wherever.pdf
Compare appearance (verbose output):
comparepdf ca -v=2 whatever.pdf wherever.pdf
First run jackd:
jackd -dalsa -dhw:USB -r48000 -p128 -n3 -Xseq
See also here
To 8-bit, 15Khz:
sox versatility.wav -b 8 -r 15k versatility_8.wav remix -
BUT sox output is really noisy; better results with ffmpeg:
ffmpeg -i boc-arpeggio.wav -ar 15000 -acodec pcm_u8 boc-arpeggio-8ff.wav
https://stackoverflow.com/a/13127738/1209004
From instructions here:
sudo apt install samba
sudo apt install caja-share
sudo mkdir /var/lib/samba/usershares
sudo chgrp sambashare /var/lib/samba/usershares
sudo chmod 1770 /var/lib/samba/usershares
sudo smbpasswd -a your_username
Then reboot machine, and right-click folder in Caja and select sharing options. After this, folder is accessible from other machines on the local network.
https://www.tutorialspoint.com/python/python_cgi_programming.htm
150 formats added in latest release:
https://github.com/usnationalarchives/digital-preservation
ffmpeg -i mirror.mp4 -vcodec libx264 -pix_fmt yuv420p -profile:v baseline -level 3 -strict -2 mirror-264.mp4
(Source)
Apparently works when deployed live:
https://exoji2e.github.io/2019/02/18/video-tag-in-chrome.html
https://www.ionos.com/community/server-cloud-infrastructure/apache/enable-cgi-scripts-on-apache/
But this assumes 1 fixed dir for cgi scripts.
https://httpd.apache.org/docs/2.4/howto/cgi.html
This explains how to set custom script locations.
https://karl-voit.at/managing-digital-photographs/
Tools here:
The Outlook desktop client for the new Outlook Interface from MS Office 365.
https://github.com/julian-alarcon/prospect-mail
https://sourceforge.net/p/openil/svn/1554/tree/trunk/Test%20Images/
Detects bit rotten files on the hard drive to save your precious photo and music collection from slow decay.
https://github.com/ambv/bitrot
https://lis655.github.io/av-python-carpentry/
http://ds.jpeg.org/whitepapers/jpeg-xl-whitepaper.pdf
Just run:
python3 -m http.server
Then site can be accessed from:
Useful for testing with local files, not suitable for production. More info:
https://developer.mozilla.org/en-US/docs/Learn/Common_questions/set_up_a_local_testing_server
https://twobitpreservation.com/script-library
https://ytdl-org.github.io/youtube-dl/index.html
We read the privacy policies of Skype, Meet, and Webex: 10 ways videoconferencing systems can better protect privacy for customers
https://medium.com/cr-digital-lab/skype-meet-webex-videoconference-privacy-845bc8360fd3
Lijkt qua doelen en scope erg op NDE project fysieke dragers:
https://automatic-ingest-digital-archives.github.io/Digital-Repair-Cafe/
Kijk bv ook hiernaar, "Handleiding Verouderde Dragers Herkennen":
https://www.projectcest.be/wiki/Publicatie:Handleiding_Verouderde_Dragers_Herkennen
https://www.howtogeek.com/669331/how-to-read-a-floppy-disk-on-a-modern-pc-or-mac/
Using Ghostscript:
https://askubuntu.com/a/256449/1052776
https://freedom.press/training/blog/videoconferencing-tools/
https://medium.com/@gdbelvin/covid-19-and-cybersecurity-e9ee5cba6de7
https://www.wikidata.org/wiki/User:YULdigitalpreservation/SPARQL2#Disk_image_file_formats
wellcomecollection/platform#4425
https://cwiki.apache.org/confluence/display/TIKA/The+Robustness+of+Apache+Tika
https://mashable.com/article/how-to-use-jitsi-meet-zoom-alternative/
https://winitor.com/pdf/Malware-Analysis-Fundamentals-Files-Tools.pdf
https://help.github.com/en/github/building-a-strong-community/about-wikis
And:
https://help.github.com/en/github/building-a-strong-community/adding-or-editing-wiki-pages
A JavaScript library to add search functionality to any Jekyll blog:
https://github.com/christian-fei/Simple-Jekyll-Search
https://jitsi.org/downloads/ubuntu-debian-installations-instructions/
https://drum.lib.umd.edu/handle/1903/25605
https://docs.google.com/spreadsheets/d/1nAPh6M5c2VlvuFtdMIDEfxwdLvQ-47-i0ZicUUGkzjM/edit#gid=0
Disable until reboot:
sudo modprobe -r uvcvideo
Enable again:
sudo modprobe uvcvideo
For a 1 MB file:
dd if=/dev/zero of=file.dat count=1024 bs=1024
Same, 1 GB file:
dd if=/dev/zero of=file.dat count=1024 bs=1048576
https://www.wasmachines.nl/forum/457-miele-w2203-lampje-overdosering/
Maar:
https://www.klusidee.nl/Forum/miele-w-3821-wasmachine-meldt-contr-dosering-t46008.html
Dus: was op 95 graden, anders speciaal reinigingsmiddel.
https://daisy.org/activities/software/wordtoepub/
Announcement:
https://daisy.org/news-events/articles/new-epub-creation-tool/
Downloads are subject to the following limits: individual file size limit: 10GB; total zip file size limit: 20GB; total number of files limit: 10,000.
Reworked this into a blog:
https://www.raymond.cc/blog/map-folder-or-directory-to-drive-letter-for-quick-and-easy-access/
https://www.filingdb.com/pdf-text-extraction
Graphviz is open source graph visualization software. Graph visualization is a way of representing structural information as diagrams of abstract graphs and networks.
Bot Sentinel is a free platform developed to detect and track trollbots and untrustworthy Twitter accounts.
https://philarcher.org/diary/2020/importanceOfPersistence/
https://www.maketecheasier.com/sync-onedrive-linux/
https://matthewlincoln.net/2014/03/15/coins-for-your-jekyll-blog.html
https://journal.code4lib.org/articles/14978
https://google-webfonts-helper.herokuapp.com/fonts
https://www.repairfaq.org/sam/cdfaq.htm
Check items under "Intermittent or erratic operation" and "Operation is poor or erratic when cold".
https://www.youtube.com/watch?v=jAehSoTmLGY
https://jekyllcodex.org/without-plugins/
https://github.com/dessant/web-archives
https://guides.lib.unc.edu/accessdigitalarchives
Command-line:
https://www.maketecheasier.com/ip-address-geolocation-lookups-linux/
Python:
https://pypi.org/project/geoip2/
Uses MaxMind databases.
BUT getting IP address from URL is difficult in python, so perhaps better to use bash:
https://linuxhandbook.com/find-website-ip-address-linux/
Windows Registry Editor Version 5.00
[HKEY_CLASSES_ROOT\*\shell\mkd2doc]
[HKEY_CLASSES_ROOT\*\shell\mkd2doc\command]
@="\"F:\\Pandoc\\pandoc.exe\" -s -S --ascii -N --toc-depth=2 \"%1\" -o \"%1.docx\""
Then save as pandoc.reg
.
This may be relevant to Iromlab or OmSipCreator:
https://docs.python.org/3/whatsnew/3.8.html#collections
Example:
https://github.com/kieranjol/IFIscripts/commit/c6eedd9ec0821b7108f7a93f81bf043a6cb53d20
(Via Twitter)
https://en.wikipedia.org/wiki/PinePhone
http://kcall.co.uk/ssd/index.html
https://www.hellovoid.online/product/task-failed-successfully-enamel-pin-pre-order
https://forums.linuxmint.com/viewtopic.php?t=265077
Solved by running following codeblock (as described here):
OLDCONF=$(dpkg -l|grep "^rc"|awk '{print $2}')
CURKERNEL=$(uname -r|sed 's/-*[a-z]//g'|sed 's/-386//g')
LINUXPKG="linux-(image|headers|ubuntu-modules|restricted-modules)"
METALINUXPKG="linux-(image|headers|restricted-modules)-(generic|i386|server|common|rt|xen)"
OLDKERNELS=$(dpkg -l|awk '{print $2}'|grep -E $LINUXPKG |grep -vE $METALINUXPKG|grep -v $CURKERNEL)
YELLOW="\033[1;33m"
RED="\033[0;31m"
ENDCOLOR="\033[0m"
sudo apt-get purge $OLDKERNELS
On Implementation of Open Standards in Software: To What Extent Can ISO Standards be Implemented in Open Source Software?
Some interesting observations on JPEG 2000:
http://www.diva-portal.org/smash/get/diva2:925474/FULLTEXT01.pdf
curl user:bitsgalore
https://blog.trailofbits.com/2019/11/01/two-new-tools-that-tame-the-treachery-of-files/
https://isc.sans.edu/forums/diary/EML+attachments+in+O365+a+recipe+for+phishing/25474/
https://docs.docker.com/install/linux/linux-postinstall/
http://www.gburner.com/online-help/what-is-multisession-disc.htm
"When you add more files in a subsequent session, a complete new file system is written for the new session, but it can include references to files recorded in the previous session; this is known as linked multisession."
History:
Official recommendation is to use folder in home directory (see https://askubuntu.com/questions/1092742/where-should-i-put-appimages-files), but since homedir on home PC is on slow HD whereas OS + all other software is on fast SDD, I created a directory under root:
/Applications/
Then move AppImage files there.
https://erichennekam.blogspot.com/2014/07/lijst-webarchieven-in-de-wereld-want.html
https://docs.google.com/document/d/1N1fG4AgyBEJISc3tk5rWAc_3ZYdDbdVK4_Dbi_TusYQ/edit
https://onezero.medium.com/the-death-of-the-computer-file-doc-43cb028c0506
For testing only:
C:\Users\jkn010\AppData\Roaming\Python\Python36\site-packages\iromlab\tools\libcdio\win64\cd-info.exe -C H: --no-header --no-device-info --no-disc-mode --no-cddb --dvd > cd-info.log
"C:\Program Files\dBpoweramp\BatchRipper\Loaders\Nimbie\Pre-Batch\Pre-Batch.exe" --drive="H" --logfile="prebatch.log" --passerrorsback="prebatcherrors.log"
"C:\Program Files\dBpoweramp\BatchRipper\Loaders\Nimbie\Load\Load.exe" --drive="H" --rejectifnodisc --logfile=load.log" --passerrorsback="loaderrors.log"
"C:\Program Files (x86)\Smart Projects\IsoBuster\IsoBuster.exe" /d:H: /ei:test-h.iso /et:u /ep:oea /ep:npc /c /m /nosplash /s:1 /l:ib-h.log
-
Compile and install the software according to official documentation
-
In file
/etc/udev/rules.d/025_fc5025.rules
, replace the two occurrences ofSYSFS
withATTRS
-
Run:
sudo usermod -a -G floppy $USER
-
Reboot the machine
Tested with Linux Mint 18.3 (Sylvia), equivalent to Ubuntu Xenial.
Sources: https://groups.google.com/forum/#!topic/bitcurator-users/K1BPIbdKoOY/discussion + email correspondence with Device Side Data (the creator of the FC5025).
OfficeToPDF is a command line utility that converts Microsoft Office 2003, 2007, 2010, 2013 and 2016 documents from their native format into PDF using Office's in-built PDF export features.
https://github.com/cognidox/OfficeToPDF
"ffmprovisr for QEMU":
https://eaasi.gitlab.io/qemu-qed/
(Used this for iPRES video)
(Used this for earlier video, I think).
Directories /etc/apache2
, /var/www
and file etc/hosts
copied to folder backup-webserver
on backup disk BAKWA. Copied using:
-
sudo rsync -avhl /var/www/ ./var/www
-
sudo rsync -avhl /etc/apache2/ ./etc/apache2
-
sudo rsync -avhl /etc/hosts ./etc/
To be restored after reinstall.
https://ata.wiki.kernel.org/index.php/ATA_Secure_Erase
https://libguides.mit.edu/digmediatransfer
https://www.maketecheasier.com/sync-onedrive-linux/
https://github.com/usnationalarchives/digital-preservation
https://cloud.google.com/products/ai/ml-comic-1/
https://github.com/saramibreak/DiscImageCreator
(via Twitter)
https://www.archives.gov/records-mgmt/policy/transfer-guidance-tables.html
This document describes and examines strategies for designing lightweight microservice environments for the processing of digital, file-based, audiovisual data within an archive.
http://journal.iasa-web.org/pubs/article/view/70
- Close Bless, and open preferences file (
/home/johan/.config/bless/preferences.xml
) in a text editor. - Set temp dir by editing
pref
element withByteBuffer.TempDir
name attribute - Add closing
</preferences>
tag and save the file. File should look like below:<preferences> <pref name="ByteBuffer.TempDir">/tmp/Bless</pref> <pref name="Default.NumberBase">Hexadecimal</pref> <pref name="Undo.Actions">100</pref> <pref name="View.Toolbar.Show">True</pref> <pref name="Undo.Limited">False</pref> <pref name="View.Statusbar.Show">True</pref> <pref name="Session.RememberWindowGeometry">True</pref> <pref name="Default.Layout.UseCurrent">False</pref> <pref name="Session.RememberCursorPosition">True</pref> <pref name="Session.AskBeforeLoading">False</pref> <pref name="View.Statusbar.Selection">True</pref> <pref name="Tools.Statistics.Show">False</pref> <pref name="View.Statusbar.Offset">True</pref> <pref name="Tools.ConversionTable.LEDecoding">False</pref> <pref name="Default.EditMode">Insert</pref> <pref name="Tools.ConversionTable.Show">True</pref> <pref name="Highlight.PatternMatch">True</pref> <pref name="Undo.KeepAfterSave">Memory</pref> <pref name="Session.LoadPrevious">True</pref> <pref name="View.Statusbar.Overwrite">True</pref> <pref name="Default.Layout.File"> </preferences>
- Make the file read-only:
chmod 0444 /home/johan/.config/bless/preferences.xml
Done!
Source here
Update: this didn't quite work, but a workaround is to enter the location of the temp dir (/tmp/Bless
) directly in Bless' user interface as a text string (so don't use the file navigation widgets!).
http://162.242.228.174/share/jp2.tgz
https://blog.codinghorror.com/going-commando-put-down-the-mouse/
https://weblogs.asp.net/jongalloway/Mouseless-Computing
https://lifehacker.com/hack-attack-mouse-less-firefox-139495
Reverse Geocode takes a latitude / longitude coordinate and returns the country and city.
https://pypi.org/project/reverse-geocode/
Bron: https://twitter.com/Eijsbouts/status/1157591377624150016
https://twitter.com/rutger_/status/1156629656533110787 (archived)
Delpher link: https://resolver.kb.nl/resolve?urn=ABCDDD:010870971:mpeg21:a0117
Gebruiken als context bij xxLINK presentatie!
https://www.howtogeek.com/164570/how-to-install-android-in-virtualbox/
Then in VirtualBox change display option "Graphics Controller" to VBoxVGA, and enabled 3D acceleration, as per here.
https://www.home-assistant.io/
Added following lines to /etc/security/limits.conf
, as per here:
johan - rtprio 99
johan - nice -10
See:
https://askubuntu.com/questions/462085/deja-dup-repeatedly-asks-encryption-password
Tried:
- Re-install of duplicity
- Changed ownership of a few dirs in home that were owned by root.
Start backup from terminal:
export DEJA_DUP_DEBUG=1
deja-dup --backup
Result: backup appears to be created, but after verification stage deja-dup asks for password again. Tail end of debug output:
DUPLICITY: . self.gpg_failed()
DUPLICITY: . File "/usr/lib/python2.7/dist-packages/duplicity/gpg.py", line 272, in gpg_failed
DUPLICITY: . raise GPGError(msg)
DUPLICITY: . GPGError: GPG Failed, see log below:
DUPLICITY: . ===== Begin GnuPG log =====
DUPLICITY: . gpg: WARNING: "--no-use-agent" is an obsolete option - it has no effect
DUPLICITY: . gpg: AES256 encrypted data
DUPLICITY: . gpg: encrypted with 1 passphrase
DUPLICITY: . gpg: decryption failed: Bad session key
DUPLICITY: . ===== End GnuPG log =====
DUPLICITY: .
DUPLICITY: .
DUPLICITY: ERROR 31 GPGError
DUPLICITY: . GPGError: GPG Failed, see log below:
DUPLICITY: . ===== Begin GnuPG log =====
DUPLICITY: . gpg: WARNING: "--no-use-agent" is an obsolete option - it has no effect
DUPLICITY: . gpg: AES256 encrypted data
DUPLICITY: . gpg: encrypted with 1 passphrase
DUPLICITY: . gpg: decryption failed: Bad session key
DUPLICITY: . ===== End GnuPG log =====
DUPLICITY: .
https://linux.die.net/man/1/nwipe
Archaeology of the Amsterdam digital city; why digital data are dynamic and should be treated accordingly
https://www.tandfonline.com/doi/full/10.1080/24701475.2017.1309852
https://dash.harvard.edu/handle/1/40741399
After attaching a large external HD + including it in the backup scheme, deja-dup eats up all space of main HD. Cause: deja-dup writes some metadata and manifest files to home dir at:
~/.cache/deja-dup/
These files become very large (here: > 18 GB) which results in running out of disk space. Apparently causes problems for lots of deja-dup users, e.g. here, here. This post suggests to solve this by creating a symlink to ~/.cache/deja-dup/
on another disk with sufficient space:
mkdir /media/johan/BAKWA/.deja-dup-cache
mv ~/.cache/deja-dup/* /media/johan/BAKWA/.deja-dup-cache/
rmdir ~/.cache/deja-dup
ln -sf /media/johan/BAKWA/.deja-dup-cache ~/.cache/deja-dup
UPDATE: doesn't work, files are still written to home dir!! Interim solution: exclude external drive from deja-dup backup scheme, and back it up manually with rsync (no incremental backup though!).
List partitions:
df -h
Result:
Filesystem Size Used Avail Use% Mounted on
udev 3,9G 0 3,9G 0% /dev
tmpfs 789M 9,5M 780M 2% /run
/dev/sda1 227G 202G 14G 94% /
tmpfs 3,9G 34M 3,9G 1% /dev/shm
tmpfs 5,0M 4,0K 5,0M 1% /run/lock
tmpfs 3,9G 0 3,9G 0% /sys/fs/cgroup
cgmfs 100K 0 100K 0% /run/cgmanager/fs
tmpfs 789M 32K 789M 1% /run/user/1000
/dev/sdb1 1,9T 144M 1,9T 1% /media/johan/Elements4
So in this case we need to format /dev/sdb1
. Unmount the disk:
sudo umount /dev/sdb1
Format as ext4:
sudo mkfs.ext4 /dev/sdb1
Change generic label to WEBARCH
:
sudo e2label /dev/sdb1 WEBARCH
Done!
#!/bin/bash
# Script must be run as root!
sourceDir=/media/johan/Elements4/webarcheologie
destDir=/media/johan/WEBARCH/
rsync -avhl --dry-run $sourceDir $destDir
Copy homedir:
#!/bin/bash
# Script must be run as root!
sourceDir=~
destDir=/media/johan/BAKWA/homedir-25022020/
rsync -avhl $sourceDir $destDir
https://www.linuxjournal.com/content/filesystem-hierarchy-standard
https://www.cl.cam.ac.uk/~lp15/Pages/Scream.html
https://forums.launchbox-app.com/topic/29631-quick-mamemess-philips-cd-i-tutorial-mame-0-172/
https://publications.arl.org/16ivjbv/ (PDF link)
First install the following packages:
sudo apt install texlive-latex-extra
sudo apt-get install texlive-bibtex-extra biber
sudo apt-get install texlive-fonts-recommended
Then download the OpenSans package here. Install using following steps:
- Copy doc/, fonts/, source/, and tex/ directories to
/etc/texmf
directory - Run
mktexlsr
to refresh the file name database and make TEX aware of the new files. - Run
sudo updmap -sys --enable Map=opensans.map
to make Dvips, dvipdf and pdfTEX aware of the new fonts.
https://blog.matthewburgess.net/2019/05/digital-physical-carrier-illustrations.html
https://support.hp.com/us-en/product/hp-prodesk-400-g3-microtower-pc/7638325/manuals
https://gist.github.com/zerolab/1633661
https://gist.github.com/davidtheclark/5521432
Even easier, use SmartyPants:
https://pypi.org/project/smartypants/
https://labs.loc.gov/experiments/webarchive-datasets/
https://parametric.press/issue-01/unraveling-the-jpeg/
ArchiveBox takes a list of website URLs you want to archive, and creates a local, static, browsable HTML clone of the content from those websites (it saves HTML, JS, media files, PDFs, images and more).
Short of AI, your best bet is to run OCR (tesseract) on these files.
Use cd-discid:
cd-discid /dev/sr1
Result:
b608ed0f 15 150 8656 19406 37656 48025 58358 71683 77998 90546 103443 117153 120751 132154 144223 157688 2287
Lookup in freedb using:
Result:
200 rock b608ed0f Der Plan / Unkapitulierbar
Full record:
http://www.freedb.org/freedb/rock/b608ed0f
# xmcd
#
# Track frame offsets:
# 150
# 8656
# 19406
# 37656
# 48025
# 58358
# 71683
# 77998
# 90546
# 103443
# 117153
# 120751
# 132154
# 144223
# 157688
#
# Disc length: 2287 seconds
#
# Revision: 0
# Processed by: cddbd v1.5.2PL0 Copyright (c) Steve Scherf et al.
# Submitted via: ExactAudioCopy v0.99pb5
#
DISCID=b608ed0f
DTITLE=Der Plan / Unkapitulierbar
DYEAR=2017
DGENRE=Electronic
TTITLE0=Wie der Wind weht
TTITLE1=Lass die Katze stehn!
TTITLE2=Man leidet herrlich
TTITLE3=Grundrecht
TTITLE4=Es heißt: die Sonne
TTITLE5=Gesicht ohne Buch
TTITLE6=Stille hören
TTITLE7=Flohmarkt der Gefühle
TTITLE8=Der Herbst
TTITLE9=Körperlos im Cyberspace
TTITLE10=Zu Besuch bei N. Senada
TTITLE11=Wie schwarz ist ein Rabe?
TTITLE12=Come Fly With Me
TTITLE13=Was kostet der Austritt?
TTITLE14=Die Hände des Astronauten
EXTD=
EXTT0=
EXTT1=
EXTT2=
EXTT3=
EXTT4=
EXTT5=
EXTT6=
EXTT7=
EXTT8=
EXTT9=
EXTT10=
EXTT11=
EXTT12=
EXTT13=
EXTT14=
PLAYORDER=
Python: cddb-py; Python 3 port here.
See also CDDB.
From here:
git fetch upstream
git checkout master
git rebase upstream/master
git push -f origin master
Suppose we want to extract the Jpeg2000:NumberOfComponents
field for each JP2 image:
exiftool -csv -Jpeg2000:NumberOfComponents /media/johan/Elements4/test/*.jp2 > exif.csv
Result:
SourceFile,NumberOfComponents
/media/johan/Elements4/test/HS-19640508-001.jp2,3
/media/johan/Elements4/test/HS-19640508-002.jp2,3
::
mogrify -resize 1014 *.jpg
(Note: this changes the images in-place, so make a copy of the original images before doing this).
https://alexvanderbist.com/posts/2018/fixing-imagick-error-unauthorized
https://github.com/EG-tech/emulation-resources
The Big List of Naughty Strings is an evolving list of strings which have a high probability of causing issues when used as user-input data.
https://github.com/minimaxir/big-list-of-naughty-strings
https://espirian.co.uk/twitter-search-advanced-guide/
Below instructions are for a fresh install. Based on:
https://dominicpratt.de/fritz-nas-unter-debianubuntu-einbinden/
-
Open fstab in text editor as sudo:
sudo xed /etc/fstab
-
Add folllowing line to bottom (last line of file must be empty):
//192.168.178.1/FRITZ.NAS /media/fritzbox cifs credentials=/etc/samba/auth,vers=1.0,uid=1000,gid=1000 0
-
Create the mount directory:
sudo mkdir -p /media/fritzbox
-
Create file
/etc/samba/auth
:sudo touch /etc/samba/auth
-
Edit as sudo:
sudo xed /etc/samba/auth
-
Add username and password entries (must be FritzNAS uname + pwd, not the FritzBox ones!):
username=johan password=dfh3476fh8((77&&
-
It might be necessary to install the cifs-utils and samba packages (it seems cifs-utils is already part of the default Linux Mint install):
sudo apt-install cifs-utils sudo apt install samba
-
Finally mount:
sudo mount -a
Done!
https://forums.linuxmint.com/viewtopic.php?t=217509
A utility for file format and metadata analysis, data extraction, and image format decoding
https://github.com/jsummers/deark
https://people.xiph.org/~xiphmont/demo/neil-young.html
https://github.com/markh794/mhvtl
Install script for Ubuntu 16.04:
https://gist.github.com/hrchu/3eb1c0aa9994df0328037fff04cd889d
Then run using:
sudo /etc/init.d/mhvtl start
<https://stackoverflow.com/a/25223352/1209004
E.g.:
def main():
"""Main function"""
appDir = get_main_dir()
root = tk.Tk()
root.iconphoto(True, tk.PhotoImage(file=os.path.join(appDir, 'icon.png')))
myGUI = tapeimgrGUI(root)
# Get tape status, output to array (split at newline)
IFS=$'\n' tapeStatus=$(mt -f $TAPEnr status)
# Parse file number and block number from status output
for item in ${tapeStatus[*]}
do
if [[ $item == *"file number"* ]]; then
# Split at equal sign, 2nd item is value
tmp=$(echo $item | cut -f2 -d=)
# Strip whitespace
fileNumber="$(echo -e "${tmp}" | tr -d '[:space:]')"
#echo $fileNumber
fi
if [[ $item == *"block number"* ]]; then
# Split at equal sign, 2nd item is value
tmp=$(echo $item | cut -f2 -d=)
# Strip whitespace
blockNumber="$(echo -e "${tmp}" | tr -d '[:space:]')"
#echo $blockNumber
fi
done
This Oxford Common File Layout (OCFL) specification describes an application-independent approach to the storage of digital information in a structured, transparent, and predictable manner. It is designed to promote long-term object management best practices within digital repositories.
https://github.com/socialcopsdev/camelot/
https://remarkableapp.github.io/index.html
See also moby/moby#21925.
E.g.:
sudo du -hx --max-depth=1 /var/lib
Result contains this entry:
25G /var/lib/docker
There are probably more elegant/subtle ways to handle this, see e.g. https://lebkowski.name/docker-volumes/
Uninstall docker:
sudo apt-get remove docker docker-engine docker.io
Delete files:
sudo rm -rf /var/lib/docker
The Library’s ‘Emerging Formats’ project is focused on UK publications created for the mobile web, as interactive narratives or in database format.
https://britishlibrary.recruitment.northgatearinso.com/birl/pages/vacancy.jsf?latest=01001612
Caylin Smith and Ian Cooke report on the Emerging Formats project, which is investigating the collection management needs of published works that are created with digital formats that have significant software and hardware dependencies. They discuss the collection management challenges of these format types within the framework of UK NPLD.
http://journals.sagepub.com/doi/full/10.1177/0955749018785836
This works if Trash contains items that swere put there as superuser:
sudo rm -rf ~/.local/share/Trash/*
https://www.digitalocean.com/community/tutorials/how-to-install-wordpress-with-lamp-on-ubuntu-16-04
Use this to import kbresearch blog; then export to static site using:
https://wordpress.org/plugins/static-html-output-plugin/
https://stacks.wellcomecollection.org/digital-transformation-at-wellcome-collection-639fb177aad6
filename:ext extension:ext where ext is the extension you're interested in. You need both the filename and extension keywords to filter it down to only potential files of interest.
https://twitter.com/NKrabben/status/1022575556209074220
Example:
https://github.com/search?q=filename%3Awq1+extension%3Awq1
This repository aims to collect the smallest possible syntactically valid files in different programming/scripting/markup languages.
https://github.com/mathiasbynens/small
VisiData is an interactive multitool for tabular data. It combines the clarity of a spreadsheet, the efficiency of the terminal, and the power of Python, into a lightweight utility which can handle millions of rows with ease.
https://www.techrepublic.com/article/disk-wiping-and-data-forensics-separating-myth-from-science/
the home of the most unique Microsoft Excel animated spreadsheets
https://scholarsphere.psu.edu/concern/generic_works/bvq27zn11p
https://wiki.archivematica.org/PREMIS/METS_for_scalability
https://code.visualstudio.com/Docs/languages/markdown
http://thisdavej.com/build-an-amazing-markdown-editor-using-visual-studio-code-and-pandoc/
https://www.wikihow.com/Measure-Static-Electricity
Install in MINGW:
pacman -S mingw-w64-x86_64-gedit
Add external plugin:
https://stackoverflow.com/questions/39360149/adding-external-plug-ins-to-gedit-in-windows
Get plugins here:
https://wiki.gnome.org/Apps/Gedit/ThirdPartyPlugins-v3.0
If ELIFECYCLE / puppeteer error happens, try this (source):
sudo npm install @daisy/ace -g -unsafe-perm=true --allow-root
BUT ace now fails on this (installing Chrome doesn't help).
Our goal is to aggregate knowledge about best practices in writing and to make that knowledge immediately accessible to all authors in the form of a linter for prose.
https://github.com/amperser/proselint/
https://w3c.github.io/publ-bg/docs/EPUB4_business_case.html
http://journal.code4lib.org/articles/13438
Possibly more here.
Web service based on Ace:
https://github.com/amiaopensource/open-workflows
http://netarkivet.dk/wp-content/uploads/IntegrationOfNonHarvestedData.pdf
https://www.linuxquestions.org/questions/linux-newbie-8/read-tape-contents-944371/
http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005619
http://nvie.com/posts/a-successful-git-branching-model/
https://github.com/WikiDP/wikidp-portal
In config file ports.conf, change this line:
Listen 80
into this:
Listen 127.0.0.1:80
See:
Setting up multiple sites:
Community resource intended to provide helpful one-liners and script code specifically drawn from real-life examples in archives and libraries
https://dd388.github.io/crals/
wget --recursive --no-clobber --span-hosts --page-requisites \
--convert-links --no-parent -w 5 --random-wait \
http://blog.kbresearch.nl >>wget.log 2>&1
This doesn't quite work the way it should:
- If we leave out
--span-hosts
external stylesheets etc. are ignored, even if--page-requisites
is used (don't want that)! - If we include
--span-hosts
externally referenced pages/sites are scraped as well (don't want that either!)
See also https://gist.github.com/dannguyen/03a10e850656577cfb57
Better approach:
-
Scrape one single page:
wget --page-requisites --span-hosts --convert-links --adjust-extension -w 5 --random-wait http://blog.kbresearch.nl/2015/07/07/why-pdfa-validation-matters-even-if-you-dont-have-pdfa/ >>$logFile 2>&1
This gives us the domains used for individual page resources, which we can subsequently feed into --domains
. After some fiddling (we don't want to harvest +60 gravatar subdomains) this looks reasonable:
#!/bin/bash
url=http://blog.kbresearch.nl
domains=blog.kbresearch.nl,wp.com,researchkb.files.wordpress.com,googleapis.com,gstatic.com
logFile=wget.log
wget --mirror --page-requisites --span-hosts --convert-links --adjust-extension -w 5 --random-wait --domains=$domains $url >>$logFile 2>&1
https://arxiv.org/abs/1712.03140
swMATH is a freely accessible, innovative information service for mathematical software. swMATH not only provides access to an extensive database of information on mathematical software, but also includes a systematic linking of software packages with relevant mathematical publications.
https://docs.microsoft.com/en-us/previous-versions/windows/
See this thread on digipres.club for some context:
https://digipres.club/@joe/99650486509645352
Search URL:
One of our goals is to publish researcher's data, code, and executable Linux container all as files in a version controlled Dat repository. For this to be useful, a person should be able to execute these Linux environments (aka containers) anywhere
https://blog.datproject.org/2018/01/26/challenges-of-decentralized-hpc-containerization/
Instructions here, Ubuntu 16.04.
If updating results in warnings about package authentication, follow steps below:
owncloud/client#5287 (comment)
exiftool -xmp:all= "-all:all<xmp-tiff:all" MMKB19_000004012_00002_master.tiff
Use --max-line-length
option, e.g.:
pep8 --max-line-length=120 ~/omSipCreator/omSipCreator > pep8.txt
https://www.degruyter.com/view/j/rest.2017.38.issue-3/res-2016-0032/res-2016-0032.xml?format=INT
COMPACT DISC SERVICE LIFE: AN INVESTIGATION OF THE ESTIMATED SERVICE LIFE OF PRERECORDED COMPACT DISCS (CD-ROM)
https://www.loc.gov/preservation/resources/rt/CDservicelife_rev.pdf
https://www.loc.gov/preservation/scientists/projects/cd_longevity.html
https://www.loc.gov/preservation/scientists/projects/cd-r_dvd-r_rw_longevity.html
https://www.ossblog.org/markdown-editors/
git checkout -- .
(see also stackoverflow)
validate METS file against best practices:
Schematron rules:
sf -csv t/images | cut -d ',' -f 6 | sort | uniq -c | sort -r
Result:
8 x-fmt/390
7 fmt/645
5 fmt/41
5 fmt/101
4 fmt/43
3 x-fmt/62
3 x-fmt/263
3 x-fmt/111
3 fmt/44
2 fmt/661
2 fmt/5
2 fmt/17
28 UNKNOWN
1 x-fmt/92
::
etc
(Source: Nick Krabbenhöft)
https://stackoverflow.com/a/7244456
https://gist.github.com/bitsgalore/7c5da72277557b608c94
https://sourceforge.net/p/exiftool/code/ci/master/tree/t/images/
Not working, problem seems to correspond to issue here:
https://forums.linuxmint.com/viewtopic.php?f=47&t=260925
Create/update package database:
pacman -Fy
Result:
:: Synchronizing package databases...
mingw32 2.4 MiB 2.97M/s 00:01 [#####################] 100%
mingw32.sig 96.0 B 0.00B/s 00:00 [#####################] 100%
mingw64 2.4 MiB 1695K/s 00:01 [#####################] 100%
mingw64.sig 96.0 B 0.00B/s 00:00 [#####################] 100%
msys 855.8 KiB 4.24M/s 00:00 [#####################] 100%
msys.sig 96.0 B 0.00B/s 00:00 [#####################] 100%
Find package name from (sub) string:
pacman -Fsx iso-info
Result:
mingw32/mingw-w64-i686-libcdio 2.0.0-1
mingw32/bin/iso-info.exe
mingw32/share/man/man1/iso-info.1.gz
mingw64/mingw-w64-x86_64-libcdio 2.0.0-1
mingw64/bin/iso-info.exe
mingw64/share/man/man1/iso-info.1.gz
Install package:
pacman -S mingw-w64-x86_64-libcdi0
Uninstall package:
pacman -R mingw-w64-x86_64-libcdi0
Source: https://github.com/msys2/msys2/wiki/Using-packages
Query:
extent any "cdrom* cd-rom*" and annotation any "Mac*" not annotation any "Win* PC*"
Result:
Query:
extent any "blu*"
Result (only 5 hits, 23/1/2018):
Command:
7z l -slt iso9660.iso
Result:
7-Zip [64] 9.20 Copyright (c) 1999-2010 Igor Pavlov 2010-11-18
p7zip Version 9.20 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,4 CPUs)
Listing archive: iso9660.iso
--
Path = iso9660.iso
Type = Iso
Created = 2017-06-30 18:31:33
Modified = 2017-06-30 18:31:33
----------
Path = nimbie.jpg
Folder = -
Size = 69424
Packed Size = 69424
Modified = 2017-06-30 13:23:38
Path = readme.txt
Folder = -
Size = 37
Packed Size = 37
Modified = 2017-06-30 13:25:20
UDF Bridge:
7z l -slt iso9660_udf.iso
Result:
7-Zip [64] 9.20 Copyright (c) 1999-2010 Igor Pavlov 2010-11-18
p7zip Version 9.20 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,4 CPUs)
Listing archive: iso9660_udf.iso
--
Path = iso9660_udf.iso
Type = Udf
Comment = UDF Bridge demo
Cluster Size = 2048
Created = 2017-06-30 18:31:33
----------
Path = nimbie.jpg
Folder = -
Size = 69424
Packed Size = 69632
Modified = 2017-06-30 13:23:38
Accessed = 2017-06-30 18:31:33
Path = readme.txt
Folder = -
Size = 37
Packed Size = 2048
Modified = 2017-06-30 13:25:20
Accessed = 2017-06-30 18:31:33
https://twitter.com/anjacks0n/status/941020183812100096
Esp.:
Without Tika, relying on on DROID, there would have been 25,887,108 unidentified resources - mostly plain text, JS, CSS etc. Without DROID, only 464 would go unidentified, but we'd have no format-version-level information. Combining tools is crucial for web archives.
Using iso-info:
iso-info -l -i dvd-erik.iso
Result:
d [LSN 22] 4096 Jan 01 1970 01:00:00 .
d [LSN 22] 2048 Jan 01 1970 01:00:00 ..
- [LSN 26] 158549392 Jul 30 2008 09:33:59 086_10B21_078v_079r.TIF
- [LSN 77443] 158633884 Jul 30 2008 09:34:08 087_10B21_079v_080r.TIF
- [LSN 154901] 157658880 Jul 30 2008 09:34:19 088_10B21_080v_081r.TIF
- [LSN 231883] 157877788 Jul 30 2008 09:34:29 089_10B21_081v_082r.TIF
::
::
- [LSN 2092850] 158203324 Jul 30 2008 09:38:31 113_10B21_105v_106r.TIF
- [LSN 2170098] 156139844 Jul 30 2008 09:38:41 114_10B21_106v_107r.TIF
Here LSN * 2048 = offset of start of file.
From the manual:
--try-again Mark all non-trimmed and non-scraped blocks inside the rescue domain as non-tried before beginning the rescue. Try this if the drive stops responding and ddrescue immediately starts scraping failed blocks when restarted. If '--retrim' is also specified, mark all failed blocks inside the rescue domain as non-tried.
Useful if ddrescue remains stuck endlessly in "scraping failed blocks".
msiexec /a putty-64bit-0.70-installer.msi
Disable PDF/A validation, only extract features:
verapdf --off --extract whatever.pdf > whatever.xml
Recursively process directory tree:
verapdf --recurse --off --extract myDir > whatever.xml
- https://zenodo.org/communities/kbnl/search?page=1&size=20&q=keywords:%22digital%20preservation%22
- https://zenodo.org/communities/kbnl/search?page=1&size=20&q=keywords:%22digital%20scholarship%22
- https://zenodo.org/communities/kbnl/search?page=1&size=20&q=keywords:enrichment
- https://zenodo.org/communities/kbnl/search?page=1&size=20&q=keywords:IPR
- https://zenodo.org/communities/kbnl/search?page=1&size=20&q=keywords:%22public%20libraries%22
https://docs.google.com/spreadsheets/d/1g2vbAFBHWhsPRkNljbQBsKasMI-GCFTsQLol0cFT6js/edit#gid=0
https://web.archive.org/web/20020201195007/http://www.geocities.com:80/SiliconValley/4031/
https://superuser.com/questions/670324/obtaining-a-list-of-all-hyperlinks
(Note: lots of DOIs in references don't resolve at all, or resolve to wrong location!)
http://onlinelibrary.wiley.com/doi/10.1111/1746-692X.12106/full
http://www.nature.com/news/a-clean-green-science-machine-1.17125?WT.mc_id=TWT_NatureNews
http://tyndall.ac.uk/sites/default/files/twp161.pdf
https://www.chemistryworld.com/opinion/cutting-the-science-travel-footprint/9567.article
Archivematica examples in:
https://www.loc.gov/standards/premis/examples.html
Paper by Andy Jackson (2012):
http://arxiv.org/pdf/1210.1714.pdf
https://twitter.com/andrewjbtw/status/920791293122396160
For one file:
convert whatever_compressed.tif +compress whatever_uncompressed.tif
Multiple files:
#!/bin/bash
# Input and output directories
dirIn=~/tiffsDDD
dirOut=~/tiffsDDUncompressed
while IFS= read -d $'\0' -r file ; do
# File basename
bName=$(basename -s .TIF "$file")
# Output name
outName=$bName.TIF
# Full output paths
fOut="$dirOut/$outName"
# Convert to uncompressed TIFF
convert $file +compress $fOut
done < <(find $dirIn -type f -name "*.TIF" -print
-
"Failed to start the X server" message in login screen
Solution:
https://linuxnorth.wordpress.com/2017/07/04/installing-and-uninstalling-lightdm-in-linux-mint-18-2/
-
Top/title bar of windows missing, cannot move windows.
Solution: go to Preferences/Desktop Settings/Windows and select a Window Manager from the dropdown menu (for some reason no WM is selected by default).
-
Window resize margin in default Metacity window manager is only 1 px wide
Solution: https://askubuntu.com/questions/4109/how-do-i-increase-the-resize-margin-on-windows
This library provides a fast, standalone way to read and write WARC Format commonly used in web archives.
https://github.com/webrecorder/warcio
Includes ARC/WARC validation:
https://sbforge.org/display/JWAT/Running+JWAT-Tools
https://technet.microsoft.com/en-us/library/ee309278(office.12).aspx
https://github.com/apache/tika/tree/master/tika-core/src/main/resources/org/apache/tika/mime
Kaitai Struct is a declarative language used for describe various binary data structures, laid out in files or in memory (...).
The main idea is that a particular format is described in Kaitai Struct language (.ksy file) and then can be compiled with ksc into source files in one of the supported programming languages. These modules will include a generated code for a parser that can read described data structure from a file / stream and give access to it in a nice, easy-to-comprehend API.
Use -d
option with invalid-name
:
python3 -m pylint -d invalid-name boxvalidator.py > pylintjpylyzer.txt
https://github.com/Dzonatas/solution/tree/master/Documentation
Following command will keep logibn credentials in cache for 1 hour:
git config --global credential.helper "cache --timeout=3600"
For some reason I always forget this (below for OpenJPEG):
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
- http://hirise-pds.lpl.arizona.edu/download/PDS/RDR/ESP/ORB_011200_011299/ESP_011265_1560/ESP_011265_1560_RED.JP2 (2GB)
- 6.7 GB http://apollo.sese.asu.edu/data/pancam/AS16/jp2/AS16-P-4102.jp2
https://gist.github.com/danielestevez/2044589
Prince tool:
waeasyprint (OS alternative):
The goal of this Action is to improve scientific understanding of the implications of digitization, hence helping individuals, disciplines, societies and sectors across Europe to cope optimally with the effects.
http://blog.online-convert.com/huge-list-of-example-files-creative-commons/
robocopy sourceDir destDir /COPYALL /E /R:0 /DCOPY:T
E.g.:
robocopy H:\iromlabTestKBDepotNew M:\DigitalPreservation\optischeDragers\iromlabTestKBDepot /COPYALL /E /R:0 /DCOPY:T >robocopy.stdout 2>robocopy.stderr
Some useful links:
Good description of the problem:
https://lists.debian.org/debian-user/2005/01/msg02339.html
the sector numbers in the file system refer to sectors of the original CD rather than sectors of session2.iso. I don't know of a utility for rewriting them so that the file can be loop-mounted or written to an ordinary CD, but you can at least get a directory listing by using isoinfo with an offset:
isoinfo -i session2.iso -N 204345 -l
https://lists.gnu.org/archive/html/libcdio-devel/2010-02/msg00048.html
Esp.:
Remember, the path table and directory structure of the iso reflect the fact that the ISO filesystem starts on sector 222145 (49:23:70) of the CD. If it is burned to another CD at a different position, it won't work. Likewise, any program that reads the iso will need to be able to compensate for the offset. Try, for example: isoinfo -N 222145 -d -i '8mm-songs_to_love_and_die_by.iso'
Also (from same thread):
https://lists.gnu.org/archive/html/libcdio-devel/2010-02/msg00053.html
Default encoding for read/write write depends on locale settings, which can result in unexpected behaviour. See e.g.:
Solution: always set the encoding explicitly when opening a file for read/write in text mode. Example:
# Byte sequence corresponds to multiplication sign in UTF-8
myBytes = b'\xc3\x97'
# Decode to string
myString = myBytes.decode('utf-8')
# Write myString to file
with open("myString.txt", "w", encoding="utf-8") as ms_file:
ms_file.write(myString)
In this case, create link to f:\Pandoc\pandoc.exe
in directory c:\bin
:
mklink pandoc.exe F:\Pandoc\pandoc.exe
Powershell method:
Get-ItemProperty HKLM:\Software\Wow6432Node\Microsoft\Windows\CurrentVersion\Uninstall\* | Select-Object DisplayName, DisplayVersion, Publisher, InstallDate | Format-Table –AutoSize > installedPrograms.txt
https://www.loc.gov/standards/premis/guidelines2017-premismets.pdf
java -jar tika-app-1.14.jar -t whatever.epub > whatever.txt
BUT doesn't return chapters in reading order!!
https://github.com/deanmalmgren/textract
Installs with errors under Windows; seems to work OK on Linux.
https://github.com/nscaife/file-windows
Saves output file as 24 bits / channel:
ffmpeg -i frogs-01.wav -codec pcm_s24le frogs-01-24-bit.wav
For list of all codec
values:
ffmpeg -codecs
http://stackoverflow.com/questions/14132789/relative-imports-for-the-billionth-time
https://wiki.gentoo.org/wiki/FFmpeg_-_Extract_Blu-Ray_Audio
From:
https://support.microsoft.com/nl-nl/help/100027/info-direct-drive-access-under-win32
To open a physical hard drive for direct disk access (raw I/O) in a Win32-based application, use a device name of the form
\\.\PhysicalDriveN
where N is 0, 1, 2, and so forth, representing each of the physical drives in the system.
To open a logical drive, direct access is of the form
\\.\X:
where X: is a hard-drive partition letter, floppy disk drive, or CD-ROM drive.
E.g. compute checksum on CD in d: drive:
md5sum \\.\D:
Access to logical drives:
http://stackoverflow.com/q/6522644/1209004
Write access:
http://stackoverflow.com/q/7135398/1209004
Reading raw disks with Python:
http://blog.lifeeth.in/2011/03/reading-raw-disks-with-python.html
https://github.com/barneygale/isoparser
BUT this will make accessing the site CAPTCHA hell for Tor users: https://support.cloudflare.com/hc/en-us/articles/203306930-Does-CloudFlare-block-Tor-
Alternatives:
- CERTBot / Letsencrypt: requires server access
- Github pages has built-in https support, but only for github.io domains.
https://www.codementor.io/arpitbhayani/host-your-python-package-using-github-on-pypi-du107t7ku
One everything is set up, for each new release the basic steps are:
- Update version number in main code
- Update link to
download_url
(in my case this is automated) - Commit changes & push
- Add tag:
git tag -a x.y.z -m "whatever"
git push --tags
python setup.py register -r pypi
python setup.py sdist upload -r pypi
The md5sum of a "burnt" CD can be different than the md5sum of the associated iso file and not indicate an error
http://twiki.org/cgi-bin/view/Wikilearn/CdromMd5sumsAfterBurning
See also:
http://superuser.com/questions/220082/how-to-validate-a-dvd-against-an-iso
https://warekennis.nl/wp-content/uploads/2013/03/BOOKS-AND-LITERATURE-STATUS-REVIEW-2017-.pdf
ffprobe track01.cdda.wav -show_format -show_streams > properties.txt
Result (file properties.txt):
[STREAM]
index=0
codec_name=pcm_s16le
codec_long_name=PCM signed 16-bit little-endian
profile=unknown
codec_type=audio
codec_time_base=1/44100
codec_tag_string=[1][0][0][0]
codec_tag=0x0001
sample_fmt=s16
sample_rate=44100
channels=2
channel_layout=unknown
bits_per_sample=16
id=N/A
r_frame_rate=0/0
avg_frame_rate=0/0
time_base=1/44100
start_pts=N/A
start_time=N/A
duration_ts=8233176
duration=186.693333
bit_rate=1411200
max_bit_rate=N/A
bits_per_raw_sample=N/A
nb_frames=N/A
nb_read_frames=N/A
nb_read_packets=N/A
DISPOSITION:default=0
DISPOSITION:dub=0
DISPOSITION:original=0
DISPOSITION:comment=0
DISPOSITION:lyrics=0
DISPOSITION:karaoke=0
DISPOSITION:forced=0
DISPOSITION:hearing_impaired=0
DISPOSITION:visual_impaired=0
DISPOSITION:clean_effects=0
DISPOSITION:attached_pic=0
DISPOSITION:timed_thumbnails=0
[/STREAM]
[FORMAT]
filename=track01.cdda.wav
nb_streams=1
nb_programs=0
format_name=wav
format_long_name=WAV / WAVE (Waveform Audio)
start_time=N/A
duration=186.693333
size=32932748
bit_rate=1411201
probe_score=99
[/FORMAT]
XML output:
ffprobe track01.cdda.wav -show_format -show_streams -print_format xml > properties.xml
Script:
https://blog.heckel.xyz/wp-content/uploads/2012/12/fritzbox-dlna-refresh
https://github.com/amiaopensource/open-workflows
https://confluence.nypl.org/display/DIG/Specifications+for+Audio+and+Moving+Image+Digitization
Mediags is a console program that scans directories for media files and verifies the integrity of those files. Detailed content reports may optionally be produced.
(Binaries windows only)
https://medium.com/swlh/browsers-not-apps-are-the-future-of-mobile-c552752ff75#.ilc1zlj1a
Video conversations with up to 8 people for free. No login required — no installs
https://www.wikidata.org/wiki/User:TweetsFactsAndQueries/A_Guide_To_WDQS
Extract references and metadata from PDF documents, and download all referenced PDFs:
https://www.metachris.com/pdfx/
http://stackoverflow.com/questions/13343096/explanation-of-need-for-multi-threading-gui-programming
https://www.wikidata.org/wiki/Wikidata:WikiProject_Informatics/File_formats
http://stackoverflow.com/questions/1623039/python-debugging-tips
http://www.filfre.net/2016/09/a-slow-motion-revolution/
An Open-Source Strategy for Documenting Events: The Case Study of the 42nd Canadian Federal Election on Twitter
http://journal.code4lib.org/articles/11358
http://checkers.eiii.eu/en/pdfcheck/
https://docs.python.org/3.6/library/queue.html
https://docs.python.org/3.6/library/sched.html
And perhaps:
https://docs.python.org/3.6/library/threading.html#module-threading
Possibly usable in CD imaging workflow (esp. interaction with operator input).
This Windows Batchscript setups a MinGW/GCC compiler environment for building ffmpeg and other media tools under Windows. After building the environment it retrieves and compiles all tools. All tools get static compiled, no external .dlls needed (with some optional exceptions)
https://github.com/jb-alvarado/media-autobuild_suite
By default this doesn't build the ffmpeg optional libraries (incl. cddio). In order to build them, if the batch file prompts you to Choose ffmpeg and mpv optional libraries?, select option 4 (All available external libs). Alternatively (if you accidentally ran the build with the default option), open file media-autobuild_suite.ini and set the value of ffmpegChoice to 4:
ffmpegChoice=4
http://lrn.no-ip.info/packages/i686-w64-mingw/libcdio/0.93-1/
http://www.student.tugraz.at/thomas.plank/
http://discid.sourceforge.net/
Tried flactag fork, which gives following output for CD-ROM:
Query failed: no actual audio tracks on disc: CDROM or DVD?
So might be useful for distinguishing between audio CD's and CD-ROMs (tarball contains Windows binary).
http://disktype.sourceforge.net/
Output audio CD:
Block device, size 690.4 MiB (723972096 bytes)
CD-ROM, 14 tracks, CDDB disk ID D912690E
Track 1: Audio track, 37.35 MiB (39163152 bytes), 3 min 42 sec
Track 2: Audio track, 87.89 MiB (92163120 bytes), 8 min 42 sec
::
Track 13: Audio track, 37.22 MiB (39029088 bytes), 3 min 41 sec
Track 14: Audio track, 78.14 MiB (81931920 bytes), 7 min 44 sec
CD-ROM:
Block device, size 223.2 MiB (233990144 bytes)
CD-ROM, 1 track, CDDB disk ID 0205F301
Track 1: Data track, 223.2 MiB (233994240 bytes)
ISO9660 file system
Volume name "0305132335"
Preparer "CEQUADRAT 32BIT ISO-9660 FORMATTER COPYRIGHT (C) 1995-1998 BY CEQUDRAT GMBH"
Data size 222.9 MiB (233682944 bytes, 114103 blocks of 2 KiB)
Joliet extension, volume name "0305132335"
Enhanced audio CD:
Block device, size 223.2 MiB (233990144 bytes)
CD-ROM, 22 tracks, CDDB disk ID 4B113416
Track 1: Audio track, 9.627 MiB (10094784 bytes), 0 min 57 sec
Track 2: Audio track, 30.01 MiB (31462704 bytes), 2 min 58 sec
::
Track 20: Audio track, 41.33 MiB (43340304 bytes), 4 min 05 sec
Track 21: Audio track, 47.73 MiB (50048208 bytes), 4 min 43 sec
Track 22: Data track, 90.84 MiB (95252480 bytes)
DVD:
Block device, size 223.2 MiB (233990144 bytes)
CD-ROM, 1 track, CDDB disk ID 023BFD01
Track 1: Data track, 2.197 GiB (2358986752 bytes)
Apple partition map, 2 entries
Partition 1: 31.50 KiB (32256 bytes, 63 sectors from 1)
Type "Apple_partition_map"
Partition 2: 2.737 GiB (2938324992 bytes, 5738916 sectors from 1108)
Type "Apple_HFS"
HFS Plus file system
Volume size 2.737 GiB (2938324992 bytes, 1434729 blocks of 2 KiB)
Volume name "BelPop Marc Moulin"
UDF file system
Sector size 2048 bytes
Volume name "BelPop Marc Moulin"
UDF version 1.50
ISO9660 file system
Volume name "BELPOPMARCMOULIN"
Data size 2.737 GiB (2938894336 bytes, 1435007 blocks of 2 KiB)
Joliet extension, volume name "BelPop Marc Moul"
(note DVD is identified as CD-ROM; doesn't realy matter as extraction fronm DVD is identical to data CD-ROM).
Compiles without problems under Windows (using Cygwin), but doesn't seem to be able to access cd-devices. E.g.:
disktype /dev/sr0
Result:
--- /dev/sr0
Block device, size 332.6 MiB (348790784 bytes)
disktype: Data read failed at position 0: Invalid request code
Or:
disktype D:\
Result:
--- D:\
disktype: D:\: Is a directory
Or:
disktype D
Result:
--- D
disktype: Can't stat D: No such file or directory
Perhaps try cdrdao scanbus
?
http://www.robvanderwoude.com/wmic.php
Example - get information about optical drives:
wmic cdrom where mediatype!='unknown' get > test.txt
The GNU Compact Disc Input and Control library (libcdio) contains a library for CD-ROM and CD image access. Applications wishing to be oblivious of the OS- and device-dependent properties of a CD-ROM or of the specific details of various CD-image formats may benefit from using this library.
http://www.gnu.org/software/libcdio/
Python interface:
https://pypi.python.org/pypi/pycdio/
Brown, "Developing Virtual CD-ROM Collections" (2012):
http://www.ijdc.net/index.php/ijdc/article/view/216/285
Page 13:
-
Create BIN/TOC file with cdrdao using:
cdrdao read-cd --read-raw --device 1,0,0 --datafile allmy.bin allmy.toc
-
Author developed SheepShaver extension that allows these images to be read by emulator
Caveats:
- The given cdrdao command only extracts one session (I guess the Voyager CD-ROMs only contain one session with both the data and audio tracks, although the paper isn't entirely clear about this).
- In case of a CD with multiple sessions one would have to repeat the command for each of those (result: one separate image for each session)
- Hybrid CD-ROMs are not supported by any of the most widely-used emulators (also stressed by author)
Jackson (BL):
On multisession carriers:
While CD-ROM, DVD and HFS+ format disks are reasonably well covered by this approach, there are some important limitations. For example, the optical media formats all support the notion of ‘sessions’ – consecutive additions of tracks to a disk. This means that a given carrier may contain a ‘history’ of different versions of the data. By choosing to extract a single disk image, we only expose the final version of the data track, and any earlier versions, sessions or tracks are ignored. For our purposes, these sessions are not significant, but this may not be true elsewhere.
BUT sessions (at least on commercially manufactured carriers) typically don't contain different versions of the same data, but data that are completely different! Example: many 'enhanced' audio CDs that contain one session with all audio tracks, and another session with a data track. So sessions are significant!
BL workflow for REd Book (audio) and Yellow Book (mixed mode) carriers:
- Image to MDS/MDF format
- Then post-process MDS/MDF file with IsoBuster
But it's not entirely clear if the MDS/MDF can handle multisession carriers?
I found this in the Knowledge Base of the developer of the format:
Image making wizard will always allow the user to create mds/mdf ccd/img/sub.
But ISO format, only for those disc's that contain 1 data track(mode1 or mode2form1).
For cue/bin only for one session disc. if the original disc is a multi-session one, then the cue/bin would not be available and If the user chooses read sub-channel, the cue/bin and iso would be unavailable as well . because iso and cue/bin could not save sub channel data.
So apparently MDS/MDF does support multisession after all!
Good overview of disc image formats here:
http://www.theisozone.com/blogs/homebrew/burning-image-file-type-explained/
Includes links to ROM and startup images:
http://www.redundantrobot.com/#/sheepshaver
Report by Cornell University:
https://ecommons.cornell.edu/handle/1813/41368
Some useful info on Mac / PC images and hybrids:
http://www.macdisk.com/faqcden.php
Contains lots of info on optical carrier and disc image formats (e.g. BIN/CUE):
http://web.archive.org/web/20070221154246/http://www.goldenhawk.com/download/cdrwin.pdf
http://stackoverflow.com/questions/10123929/python-requests-fetch-a-file-from-a-local-url
https://blog.codinghorror.com/computer-display-calibration-101/
https://blog.codinghorror.com/bias-lighting/
Find all files with .pdf extension:
find . -type f -name '*.pdf'
Count all files with .pdf extension:
find . -type f -name '*.pdf'| wc -l
Esp. 'useful links' section:
https://github.com/garbear/pyrominfo
Representation of 1 pixel in many different formats:
http://cloudinary.com/blog/one_pixel_is_worth_three_thousand_words
Online tutorials on APIs, Data Management, Data Manipulation, Distant Reading, Linked Open Data, Mapping and GIS, Network Analysis, Omeka Exhibit Building, Web Scraping and Programming with Python:
http://programminghistorian.org/lessons/
Supports lots of (old) Office-related formats + includes many conversion tools:
https://launchpad.net/ubuntu/+source/writerperfect/0.9.5-1
https://github.com/osnr/horrifying-pdf-experiments
https://en.wikibooks.org/wiki/A_Beginner%27s_Python_Tutorial/Classes
https://forums.linuxmint.com/viewtopic.php?t=177915
(Source: Nick Krabbenhöft on Twitter)
http://www.loc.gov/standards/mets/profiles/00000007.html
http://dx.doi.org/10.2218/ijdc.v4i2.107
http://www.digpres.com/publications/woodsbrownarch09.pdf
Example METS file (note that apparently they combine multiple ISOs in one AIP):
http://webapp1.dlib.indiana.edu/virtual_disk_library/index.cgi/4252478/mets
http://www.bl.uk/profiles/sound/METS_profile.pdf
https://www.blackmoreops.com/2015/06/18/linux-file-system-hierarchy-v2-0/
- Delving SIP-Creator
- Fedora SIP Creator
- UGent Sip Creator
- SIP-Builder
- RODA-In
- Dvcapture
- DURAARK SIP generator
xmllint --noout -schema schema.xsd whatever.xml
find -type f -exec md5sum "{}" + > checksums.md5
Source: http://askubuntu.com/a/318534. Works also under Cygwin.
Issue: output also includes MD5 sum of output file (which become invalid once anything is written to the file).
-
Convert master JP2 to TIFF using Kakadu (this preserves any embedded ICC profiles):
kdu_expand -i master.jp2 -o master.tiff
-
Convert TIFF to lossy JP2 with Aware via jpwrappa:
jpwrappa -m -p C:\jpwrappa\profiles\optionsKBAccessLossy_2014.xml master.tiff access.jp2
(The -m
switch can be omitted, in which case there is no need for Exiftool.)
- Acronova Nimbie USB Plus range
- Nimbie NB21-DVD
- Nimbie USB range (NB11 - not available (19/5))
- Guidelines for Digital Newspaper Preservation
- Chronicles in Preservation: Preserving Digital News and Newspapers
- Digital Preservation of Newspapers: Findings of the Chronicles in Preservation Project
- E-paper Production Workflow – Adapting Production Workflow Processes for Digital Newsprint
- PRESERVING NEWS IN THE DIGITAL ENVIRONMENT: MAPPING THE NEWSPAPER INDUSTRY IN TRANSITION
Use the --reference-docx
switch:
pandoc -S --reference-docx=template.docx test.md -o test.docx
Rollback to previous state:
git reset --hard <tag/branch/commit id>
Commit changes:
git push ... -f
Example:
git reset --hard 2dbe067c1674dcf9a23104c4b64b772e1550ba29
git push origin master -f
http://162.242.228.174/mimes/mime_comparisons.html
An open repository of web crawl data that can be accessed and analyzed by anyone
A Python port of the Apache Tika library that makes Tika available using the Tika REST Server.
https://github.com/chrismattmann/tika-python
https://www.binpress.com/tutorial/manipulating-pdfs-with-python/167
http://programminghistorian.org/lessons/intro-to-bash
https://github.com/titusz/epubcheck
http://www.tegelspreukmaker.nl/
Looks a bit similar to Prezi, but OS (presentation as SVG):
Press F9, F10, F11 or F12 twice. "Auto-rotate screen" option in Android Settings must be enabled.
Following codeblock is not rendered correctly in Wordpress:
<pre><code><div>test</div></code></pre>
Workaround is to replace forward slash in closing tag by entity reference:
<pre><code><div>test</div></code></pre>
https://github.com/ANSSI-FR/caradoc
Note: current Debian package of Opam not recent enough, so used the instructions under "Binary distribution" at https://opam.ocaml.org/doc/Install.html. Installs binary in /usr/local/bin
.
Make file initially didn't work because ocamlfind
could not be found. Fixed by typing:
eval $(opam config env)
After this it compiles without any errors.
Includes MiniDisc:
Python library that reads/writes EPUB, including EPUB 3:
https://github.com/aerkalov/ebooklib
Example, create EPUB from HTML:
https://gist.github.com/bitsgalore/4c830a301f33f584c041
http://www.cb.nl/nieuws/alle-relevante-data-over-e-books-in-nederland/
http://www.cb.nl/nieuws/e-bookbarometeblijft-groeien/
http://fileformats.archiveteam.org/wiki/Encyclopedia_of_Graphics_File_Formats
http://homepages.cwi.nl/~steven/Talks/2015/11-06-xml-amsterdam/
This works (but what's referred to as a "schema" isn't really a schema at all):
https://blog.udemy.com/excel-to-xml/
Similar to above, but uses XSD Schema directly, might be better:
https://bitwizards.com/blog/november-2010/how-to-export-an-excel-2010-worksheet-to-xml
Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot
Web archive player:
https://github.com/ikreymer/webarchiveplayer
E.g. replace every occurrence of /tmp/"$fileIn"
with /tmp/"$(cat /dev/urandom | tr -cd 'a-f0-9' | head -c 16)"
:
find /home/johan/cajascripts -type f -print0 | xargs -0 sed -i 's/\/tmp\/"$fileIn"/\/tmp\/"$(cat \/dev\/urandom | tr -cd 'a-f0-9' | head -c 16)"/g'
- Don't save offsite links
- Use 'blogs' ignore pattern
Command (I think?):
!archive http://www.flipvandyke.nl/ --no-offsite-links --ignore-sets=blogs
https://help.ubuntu.com/community/DataRecovery
If N = number of layers, then first extract layers i and below to a separate JP2 with Aware j2kdriver tool:
j2kdriver -i foo.jp2 -ql (N-i+1) -t JP2 -o foo_i.jp2
Then use jpylyzer to compute the compression ratio of resulting image.
Create derived image for each quality layer:
j2kdriver -i MMAD01_000001001_00011_master.jp2 -ql 11 -t JP2 -o layer1.jp2
j2kdriver -i MMAD01_000001001_00011_master.jp2 -ql 10 -t JP2 -o layer2.jp2
::
::
j2kdriver -i MMAD01_000001001_00011_master.jp2 -ql 1 -t JP2 -o layer11.jp2
touch -d "1 January 1768" myfile.txt
This happened to my HP ProBook 640 G1. Workaround: in BIOS, disable "wake on LAN". Source: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1470723/comments/13
http://wiki.hydrogenaud.io/index.php?title=Comparison_of_CD_rippers
http://publications.beeldengeluid.nl/pub/84
http://blog.differential.com/best-way-to-merge-a-github-pull-request/
Third option (Catch Feature Up with Master by Rebasing, then fast-forward Merge).
http://liv.science.uva.nl/index.html
Misschien delen (her)bruikbaar voor interne cursussen e.d.
Ubuntu with Nautilus file manager - Nautilus Actions:
http://www.pcsteps.com/4434-add-right-click-commands-linux-mint-ubuntu/
Linux Mint Cinnamon with Nemo file manager:
http://www.pcsteps.com/4434-add-right-click-commands-linux-mint-ubuntu/
Linux Mint Mate with Caja file manager:
http://www.ethanjoachimeldridge.info/tech-blog/caja-exifstrip-context-action
From http://stackoverflow.com/a/11202773:
Suppose I want to create a floppy image containing file oakcdrom.sys:
dd bs=512 count=2880 if=/dev/zero of=oakcd.img
mkfs.msdos oakcd.img
mcopy -i oakcd.img oakcdrom.sys ::/
Inspect contents:
mdir -i oakcd.img
General command:
ddrescue -d -n -b 512 /dev/fd0 myfloppy.img myfloppy.log
To get name of device:
lsblk
Result:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 465,8G 0 disk
├─sda1 8:1 0 457,9G 0 part /
├─sda2 8:2 0 1K 0 part
└─sda5 8:5 0 7,9G 0 part [SWAP]
sdb 8:16 0 29,8G 0 disk
sdc 8:32 1 1,4M 1 disk
So in this case it is /dev/sdc
. Create the image with:
sudo ddrescue -d -n -b 512 /dev/sdc myfloppy.img myfloppy.log
Optionaly use dosfsck tool to check the integrity of the file system (assuming it is a DOS file system). Use following command:
echo "n" |dosfsck -t -r myfloppy.img
The -t
option checks for bad clusters, but this only works in combination with -a
(automatically repair) or -r
(interactively repair). So to do the check without automatic repair or input from user we use -r
and then use a pipe to prevent any changes being made. Result:
fsck.fat 3.0.26 (2014-03-07)
Cluster 2845 is unreadable.
Cluster 2846 is unreadable.
Cluster 2847 is unreadable.
Cluster 2848 is unreadable.
Perform changes ? (y/n) myfloppy.img: 33 files, 2304/2847 clusters
Check integrity of git rpo:
http://stackoverflow.com/questions/5585388/which-git-commands-perform-integrity-checks
(Bottom line: use git fsck
.)
How to shrink the git folder:
http://stackoverflow.com/questions/5613345/how-to-shrink-the-git-folder
Exit GUI:
Ctrl-Alt-F1
Re-enter:
Ctrl-Alt-F8
From https://bugs.launchpad.net/ubuntu/+source/retext/+bug/1451125:
sudo apt-get install python3-docutils python3-markdown
From the manual:
- Turn on or restart the computer, and then press esc while the “Press the ESC key for Startup Menu” message is displayed at the bottom of the screen
- Press f10 to enter Computer Setup.
sudo badblocks -sv /dev/sda1
See also:http://askubuntu.com/questions/59064/how-to-run-a-checkdisk
/usr/share/virtualbox
Run this on host machine:
sudo ntpdate ntp.xs4all.nl
Then re-start VM; host and guest are now in sync and no more clock skew errors.
pandoc -S whatever.md -o whatever.html
http://broadcast.oreilly.com/2008/11/validating-code-lists-with-sch.html
Handige Unicode en UTF-8 achtergrondinfo:
http://codesnippets.wpakb.kb.nl/index.php?title=Character_sets
Sigil:
https://github.com/user-none/Sigil
Simple, use-friendly.
ddrescue:
http://www.gnu.org/software/ddrescue/manual/ddrescue_manual.html
Command line (Cygwin):
ddrescue -b 2048 -v /dev/scd0 test.iso test.log
disktype tool:
http://disktype.sourceforge.net/
E.g. reveals file system tyype (ISO/UDF), other tech info.
General instructions here:
http://www.msfn.org/board/topic/170785-virtualbox-windows-98se-step-by-step/
But results in error:
HID failed to attach mouse driver (VERR_PDM_NO_ATTACHED_DRIVER
Tried this:
https://forums.virtualbox.org/viewtopic.php?f=2&t=58657#p272752
VBoxInternal/USB/HidMouse/1/Config/CoordShift 0
Still doesn't work; neither does:
VBoxInternal/USB/HidMouse/1/Config/CoordShift 1
But see:
https://www.virtualbox.org/manual/ch12.html#idp60139152
Windows 2000 installation failures:
https://www.virtualbox.org/manual/ch12.html#idp60119680
Works!
Then go install guest additions:
https://docs.oracle.com/cd/E36500_01/E36502/html/qs-guest-additions.html
"AsciiMath is an easy-to-write markup language for mathematics":
git add -A
git commit -m "Changed everything"
git push origin master
git push git@github.com:openplanets/jpylyzer-test-files.git master
Versioning: x.y.z
x: API breakage y: new feature z: bugfix
git tag -a 1.1.0 -m "tagging vesion 1.1.1 with refactored code"
git push --tags
1. Convert all master JP2s to TIFF with ImageMagick, using the command:
mogrify -format tiff *.jp2
2. Conversion loses resolution info (see below), so add new values using:
exiftool *.tiff -xresolution=300 -yresolution=300 -resolutionunit=inches
3. Convert TIFFs to master JP2s:
f:\johan\pythoncode\jpwrappa\jpwrappa\jpwrappa.py M:\Trans\johan\testJP2ContrApp2014\B5\tiff\*.tiff M:\Trans\johan\testJP2ContrApp2014\B5\jp2k\master\ -p F:\johan\pythonCode\jpwrappa\jpwrappa\profiles\optionsKBMasterLossless_2014.xml -m
4. Same for access JP2s:
f:\johan\pythoncode\jpwrappa\jpwrappa\jpwrappa.py M:\Trans\johan\testJP2ContrApp2014\B5\tiff\*.tiff M:\Trans\johan\testJP2ContrApp2014\B5\jp2k\access\ -p F:\johan\pythonCode\jpwrappa\jpwrappa\profiles\optionsKBAccessLossy_2014.xml -m
But ... looking at image header box:
<imageHeaderBox> <height>2818</height> <width>1913</width> <nC>1</nC> <bPCSign>unsigned</bPCSign> <bPCDepth>8</bPCDepth> <c>jpeg2000</c> <unkC>yes</unkC> <iPR>no</iPR> </imageHeaderBox>
So "unknown colourspace" is set to "yes", which should be no (and it is "No" in the source JP2). So what is causing this? Bug in Aware software? Does this only happen with Grayscale images?
To reproduce the problem:
- Convert any JP2 to TIFF with ImageMagick (will strip away any resolution info)
- Convert TIFF to JP2 with Aware.
Run jpylyzer on resulting JP2:
<isValidJP2>False</isValidJP2> <tests> <jp2HeaderBox> <resolutionBox> <captureResolutionBox> <hRcNIsValid>False</hRcNIsValid> </captureResolutionBox> </resolutionBox> </jp2HeaderBox> </tests>
Looking at properties of resolution box:
<resolutionBox> <captureResolutionBox> <vRcN>29491</vRcN> <vRcD>7491</vRcD> <hRcN>0</hRcN> <hRcD>1</hRcD> <vRcE>1</vRcE> <hRcE>4</hRcE> <vRescInPixelsPerMeter>39.37</vRescInPixelsPerMeter> <hRescInPixelsPerMeter>0.0</hRescInPixelsPerMeter> <vRescInPixelsPerInch>1.0</vRescInPixelsPerInch> <hRescInPixelsPerInch>0.0</hRescInPixelsPerInch> </captureResolutionBox> </resolutionBox>
Here for UTF-8:
http://stackoverflow.com/a/9822937
git clone https://github.com/openpreserve/jpylyzer.git --branch gh-pages --single-branch ./jpylyzerHomepage
File:
E:\\laPeyneCDROM\\xlsfiles\\series98.xls
Refs to MACROS.XLS'!ENash
, which is missing.
Solution: before opening, disable automatic workbook calculation from options:
Loading spreadsheet now results in most recent values that are stored in workbook.
thermo filetype:tdb
Only gives results with extension tdb
.
-
An Introduction to Optical Media Preservation: http://journal.code4lib.org/articles/9581
-
What are the best CD/DVD-ROM drives for disc imaging? http://qanda.digipres.org/10/what-are-the-best-cd-dvd-rom-drives-for-disc-imaging?show=10#q10
-
CD/DVD Drive Accuracy List 2014: http://forum.dbpoweramp.com/showthread.php?34019-CD-DVD-Drive-Accuracy-List-2014
-
Preserving Write-Once DVDs: Producing Disk Images, Extracting Content, and Addressing Flaws and Errors (LoC): http://preservationmatters.blogspot.nl/2015/01/preserving-write-once-dvds.html
-
Developing a Robust Migration Workflow for Preserving and Curating Hand-held Media (Andy Jackson): http://anjackson.net/keeping-codes/practice/developing-a-robust-migration-workflow-for-preserving-and-curating-handheld-media.html
https://spotdocs.scholarsportal.info/display/EJournals/Publisher+Data+Formats
Both errors and warnings reported to same _message_ element in XML. E.g. compare:
<status>Not well-formed</status>
<messages>
<message>ERROR: /OEBPS/cover.html(5): non-standard stylesheet resource 'OEBPS/page-template.xpgt' of type 'application/vnd.adobe-page-template+xml'. A fallback must be specified.</message>
<message>ERROR: /OEBPS/copyright.html(5): non-standard stylesheet resource 'OEBPS/page-template.xpgt' of type 'application/vnd.adobe-page-template+xml'. A fallback must be specified.</message>
</messages>
with this:
<status>Well-formed</status>
<messages>
<message>WARN: /OEBPS/toc.ncx: meta@dtb:uid content 'null' should conform to unique-identifier in content.opf: '821'</message>
</messages>
So output needs some parsing. Tested w. epubcheck 3.0.1.
- E drive: Hitachi (grote drive)
- H drive: Buffalo (kleine drive)
H gebruikt als backupdisk van E.
17/18 november, poster gecanceld, wel 90 s praatje + 1 slide.
BnF:
http://www.bnf.fr/documents/ref_num_fichier_image.pdf
Readers absorb less on Kindles than on paper, study finds:
Reading and learning from screens versus print: a study in changing habits: Part 1 – reading long information rich texts:
http://www.emeraldinsight.com/doi/full/10.1108/NLW-01-2013-0012
http://www.scientificamerican.com/article/reading-paper-screens/
https://help.github.com/articles/syncing-a-fork
Requires:
https://help.github.com/articles/configuring-a-remote-for-a-fork
- PEP8
- pyflakes
- pdb: http://stackoverflow.com/a/1623085/1209004
GraphicsMagick command line:
gm convert -compress jpeg -quality 50 *.TIF test.pdf
Result: PDF with all images as JPEG, quality 50. According to Acrobat / Apache Preflight the PDF has some format conformance issues. One possible remedy is to re-process the PDF using Ghostscript. E.g. command below produces a PDF that conforms to PDF/A-1b::
gswin64 -dPDFA -dBATCH -dNOPAUSE -dUseCIEColor -sProcessColorModel=DeviceCMYK -sDEVICE=pdfwrite -sPDFACompatibilityPolicy=1 -sOutputFile=test_a.pdf test.pdf
Source: http://stackoverflow.com/questions/1659147/how-to-use-ghostscript-to-convert-pdf-to-pdf-a-or-pdf-x
Link: http://journal.code4lib.org/articles/9158
Tutorial:
http://fotoforensics.com/tutorial-estq.php
But ... this is also possible with ImageMagick / GraphicsMagick (according to Approximate Quantization Table method that is mentioned in the tutorial):
http://superuser.com/questions/62730/how-to-find-the-jpg-quality