Skip to content

Instantly share code, notes, and snippets.

@neofreko
Created April 25, 2018 03:55
Show Gist options
  • Save neofreko/daf8bb0076915bbe663fe184de99451f to your computer and use it in GitHub Desktop.
Save neofreko/daf8bb0076915bbe663fe184de99451f to your computer and use it in GitHub Desktop.
extract sentence using rtesseract
# brew install tesseract
# gem install rtesseract
# gem install mini_magick
#
# example:
# puts image_has_sentence?('/Users/me/Pictures/hi.png', 'hello world')
require 'rtesseract'
require 'mini_magick'
def image_has_sentence?(image_path, sentence)
image_output = Tempfile.new(%w[rtesseract jpg])
# scale up and make grayscale. they said it works a charm. it does!
MiniMagick::Tool::Convert.new do |convert|
convert << image_path
convert.merge! ['-resize', '200%', '-negate', '-set', 'colorspace', 'Gray']
convert << image_output.path
end
# The secret here is the `psm` value. See `tesseract --help`.
# Different kind of image will benefit from certain psm value
result = RTesseract.new(image_output.path, processor: 'mini_magick', psm: 1, debug: true)
return (yield sentence).call result.to_s if block_given?
result.to_s.include? sentence
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment