Skip to content

Instantly share code, notes, and snippets.

@markmatney
Last active May 31, 2018 23:51
Show Gist options
  • Select an option

  • Save markmatney/1d3fa5f6839374647d2224884d5afa74 to your computer and use it in GitHub Desktop.

Select an option

Save markmatney/1d3fa5f6839374647d2224884d5afa74 to your computer and use it in GitHub Desktop.
implementation of the data processing pipeline for my IIIF Washington 2018 presentation

An Example Data Processing Pipeline for Generating IIIF AnnotationLists and Ranges Using Structured OCR Data and Pattern Matching

UNDER CONSTRUCTION

For background information, see http://iiif.io/event/2018/washington/program/paper-55/ and https://tinyurl.com/2018-iiif-washington-55.

Generating structured OCR data from images with Tesseract

Extracting images and structured OCR data from a multi-page PDF

Generating sparse IIIF AnnotationLists from structured OCR data with pattern matching

Generating IIIF Ranges from structured OCR data with pattern matching

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment