canadaduane/question.md

## question.md

      
    Raw
  

              question.md
            
          
    I'm curious if there has been any attempt to calculate a "scan quality" measure for the OCRed books in the archive. I deal frequently with data from the period 1700-1830 and I notice a wide variety in quality. Ideally, I'd like a "scanquality" property that I could set parameters for in a search.
I've implemented something like this for books that I've downloaded from archive.org, but it seems like it would be useful for all patrons of archive.org to have this metadata. Has implementing an algorithm like this been talked about? How interesting is it, and if interesting, with whom should I speak to develop something like this?