Created
April 25, 2018 03:55
-
-
Save neofreko/daf8bb0076915bbe663fe184de99451f to your computer and use it in GitHub Desktop.
extract sentence using rtesseract
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# brew install tesseract | |
# gem install rtesseract | |
# gem install mini_magick | |
# | |
# example: | |
# puts image_has_sentence?('/Users/me/Pictures/hi.png', 'hello world') | |
require 'rtesseract' | |
require 'mini_magick' | |
def image_has_sentence?(image_path, sentence) | |
image_output = Tempfile.new(%w[rtesseract jpg]) | |
# scale up and make grayscale. they said it works a charm. it does! | |
MiniMagick::Tool::Convert.new do |convert| | |
convert << image_path | |
convert.merge! ['-resize', '200%', '-negate', '-set', 'colorspace', 'Gray'] | |
convert << image_output.path | |
end | |
# The secret here is the `psm` value. See `tesseract --help`. | |
# Different kind of image will benefit from certain psm value | |
result = RTesseract.new(image_output.path, processor: 'mini_magick', psm: 1, debug: true) | |
return (yield sentence).call result.to_s if block_given? | |
result.to_s.include? sentence | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment