Skip to content

Instantly share code, notes, and snippets.

@RoxasShadow
Last active February 14, 2018 00:46
Show Gist options
  • Save RoxasShadow/28a481e9f8a4a17f36a2c0f1be41a1c5 to your computer and use it in GitHub Desktop.
Save RoxasShadow/28a481e9f8a4a17f36a2c0f1be41a1c5 to your computer and use it in GitHub Desktop.
Use imagemagik and Capture2Text to perform OCR
im = "C:\\Program Files\\ImageMagick-7.0.7-Q16\\magick.exe"
c2t = "H:\\Users\\Giovanni\\Downloads\\Capture2Text_v4.5.1_64bit\\Capture2Text_CLI.exe"
Dir.mkdir('res') unless Dir.exists?('res')
require 'thread/pool'
pool = Thread.pool(8)
Dir['*.png'].each do |f|
pool.process {
`"#{im}" convert \"#{f}\" -transparent white "res/#{f}"`
`"#{c2t}" --vertical -l Japanese -i "res/#{f}" -b --trim-capture --deskew -o "res/#{f}.txt"`
}
end
pool.shutdown
require "prawn"
Prawn::Document.generate("output.pdf") do
font "mona.ttf"
Dir['res/*.txt'].each do |f|
text File.open(f, "r:UTF-8", &:read)
start_new_page
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment