Skip to content

Instantly share code, notes, and snippets.

@philcryer
philcryer / oicr
Created October 6, 2010 02:30
downloads a djvu file, splits it into pages (djvmcvt), parses out ocr'd text (djvused) and also creates an XML of the ocr'd text (djvutoxml)
#!/bin/bash
[[ -n "${1}" ]] || { echo "Usage: oicr.sh IA_BOOK_TITLE"; exit 0 ; }
# sample record ids: catalogueoflepid02briti electronicnaviga00unit halfhoursinfarno00newy nachrichtsblattd3234190012deut
BOOK=${1}
#BASEURL=http://cluster.biodiversitylibrary.org
BASEURL=http://www.archive.org/download