githubutilities/README.md

## README.md

      
    Raw
  

              README.md
            
          
    Here is a list of OCR software:


Ocrad


This also gets a JS version, but same as Ocrad, the result is not promising.


GOCR


It got better result than Ocrad and 'so-called' more format support.
You can use imagemagick to convert images to supported format


tesseract


This is the most accurate and useful OCR software I found.
For language support, download the language packs from here and place the extract data to share/testdata folder.
e.g. download the chinese simplified pack and place it extracted data to /usr/local/Cellar/tesseract/3.02.02_3/share/tessdata folder, then run tesseract with -l chi_sim option(tesseract image.pdm out -l chi_sim).
I tried google homepage logo and it seems tesseract works better with .pnm format.


Improve qualities


Convert image to bitonal image

convert input.jpg -threshold 50% output.jpg
# add `-negate` option to invert the image
convert input.jpg -negate output.jpg

image preprocessing to improve OCR
image decompose

Read More


android OCR by stackoverflow
review of some open source OCR software