Skip to content

Instantly share code, notes, and snippets.

@canadaduane
Created January 2, 2015 04:38
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save canadaduane/ec76fc07bdc61dabc74c to your computer and use it in GitHub Desktop.
Save canadaduane/ec76fc07bdc61dabc74c to your computer and use it in GitHub Desktop.
Question for archive.org team

I'm curious if there has been any attempt to calculate a "scan quality" measure for the OCRed books in the archive. I deal frequently with data from the period 1700-1830 and I notice a wide variety in quality. Ideally, I'd like a "scanquality" property that I could set parameters for in a search.

I've implemented something like this for books that I've downloaded from archive.org, but it seems like it would be useful for all patrons of archive.org to have this metadata. Has implementing an algorithm like this been talked about? How interesting is it, and if interesting, with whom should I speak to develop something like this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment