Skip to content

Instantly share code, notes, and snippets.

@jbaiter
Created October 2, 2017 12:53
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jbaiter/d631936a920be0148685b2d70e8f3824 to your computer and use it in GitHub Desktop.
Save jbaiter/d631936a920be0148685b2d70e8f3824 to your computer and use it in GitHub Desktop.

archiscribe-corpus

This is the corpus repository for https://archiscribe.jbaiter.de.

The goal is to have as much diverse OCR ground truth for 19th Century German prints as possible.

Currently the corpus contains 123 lines from 3 works published across 3 years. Detailed statistics are available below.

Statistics: Decades

Decade # lines
1860 48
1880 50
1890 25

Statistics: Years

Year # lines
1868 48
1881 50
1894 25

Statistics: Works

Title Date Archive.org IIIF
Natur und Gemüth Ein Feld und Waldblüthenstrauß aus Tagen die nicht mehr sind, Gewunden von Friedrich Aulenbach 1868 bub_gb_HF46AAAAcAAJ Manifest/Mirador
Geschichte der Deutschen bis zur höchsten Machtentfaltung des Römisch ... 1881 geschichtederde00bessgoog Manifest/Mirador
Die forstlichen Verhaltnisse Preussens 1894 dieforstlichenv02hagegoog Manifest/Mirador
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment