Skip to content

Instantly share code, notes, and snippets.

View bitsgalore's full-sized avatar

Johan van der Knijff bitsgalore

View GitHub Profile
<?xml version="1.0"?>
<!--
Schematron jpylyzer schema: verify if JP2 conforms to
KB's profile for access copies (A.K.A. KB_ACCESS_LOSSY_01/07/2014)
Simplified version for Geheugen Van Nederland migration, omits specific requirements for
resolution, colour space,compression ratio, XML box and codestream comment.
Johan van der Knijff, KB / National Library of the Netherlands , 18 March 2014.
-->
<s:schema xmlns:s="http://purl.oclc.org/dsdl/schematron">
@bitsgalore
bitsgalore / compratio.sh
Created April 8, 2015 16:18
Compute compression ratio for all JP2s in directory tree
#!/bin/bash
# Compute compression ratio of each JP2 in directory tree, report results to CSV file
# Requires:
# - jpylyzer
# - xmllint (part of libxml library)
#
# If you're using Windows you can run this shell script within a Cygwin terminal: http://www.cygwin.com/
#
# Installation directory
@bitsgalore
bitsgalore / extractlayers.sh
Created April 9, 2015 11:25
For each JP2 image in a directory, generate derived image that discards user-defined number of quality layers. Requires Aware j2kdriver tool.
#!/bin/bash
# Generate derived JP2s that discard user-specified number of quality layers from source JP2s
# Requires:
# - j2kdriver (Aware)
#
# If you're using Windows you can run this shell script within a Cygwin terminal: http://www.cygwin.com/
#
# Installation directory
instDir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
@bitsgalore
bitsgalore / extensionsKBDM.md
Last active August 29, 2015 14:19
50 most prevalent formats in KB e-Depot by file extension, based on March 2014 count. Use scrollbar at bottom to display remarks column to the right.
Extension Number of files in e-Depot ID(s) Tika Remarks
gif 34499095 - image/gif GIF image
xml 12913388 - application/xml XML (mostly metadata)
jpg 8197415 N/A* JPEG image
sml 7744829 - image/gif GIF image with unusual extension
pdf 7577414 - application/pdf PDF
raw 2045662 - text/plain Text file
tif 715509 - image/tiff TIFF image
oa3 296101 - text/plain Looks like SGML (oases, Kluwer). See also: Publisher Data Formats. Metadata.
@bitsgalore
bitsgalore / zipfilesKBDM.md
Last active August 29, 2015 14:19
File formats inside ZIP files (based on 22 ZIP files only!)
@bitsgalore
bitsgalore / softwareReadingRooms.md
Created April 23, 2015 16:43
Rendering of top 50 formats in KB reading rooms
Category Rendering software in reading rooms Formats accessible in reading rooms?
Image formats MS Paint, Windows Photoviewer Yes
PDF Adobe Acrobat Yes
Web formats Internet Explorer, Google Chrome Yes
Office formats Microsoft Office Yes (support for old Office formats presently not clear)
Audio Windows Media Player, VLC Media Player No (hardware in reading rooms doesn't support audio)
Video Windows Media Player, VLC Media Player Partially (hardware in reading rooms doesn't support audio)
Metadata Internet Explorer, Notepad, Wordpad Yes
Executables, installers, system files Not applicable No
@bitsgalore
bitsgalore / formatCategories.md
Last active August 29, 2015 14:20
File format categories
Category Example formats Needs dedicated software for rendering?
Image formats JPEG, TIFF, BMP Yes
PDF PDF Yes
Web formats HTML, SWF Yes
Office formats Word, Excel, Powerpoint, RTF Yes
Audio WAV, AIFF, MP3 Yes
Video MP4, AVI Yes
Metadata XML, SGML No (perhaps XML editor/viewer)
Executables, installers, system files EXE, CAB, INF No (unless in shielded virtual environment)
@bitsgalore
bitsgalore / softwareReadingRooms.md
Last active August 29, 2015 14:20
Software in KB reading rooms for each format category
Format category Rendering software
Image formats MS Paint, Windows Photoviewer
PDF Adobe Acrobat
Web formats Internet Explorer, Google Chrome
Office formats Microsoft Office
Audio Windows Media Player, VLC Media Player
Video Windows Media Player, VLC Media Player
Metadata Internet Explorer, Notepad, Wordpad
Executables, installers, system files Not applicable
@bitsgalore
bitsgalore / eDepotFExtentions_v3.md
Created April 29, 2015 16:16
File extensions in KB e-Depot, status March 2014
Number of files File extension
34499095 .gif
12913388 .xml
8197415 .jpg
7744829 .sml
7577414 .pdf
2045662 .raw
715509 .tif
296101 .oa3