Skip to content

Instantly share code, notes, and snippets.

View torreypayne's full-sized avatar

Torrey Payne torreypayne

  • Thirty Madison
  • New York, NY
View GitHub Profile
@torreypayne
torreypayne / ruby_ocr.rb
Created March 4, 2024 16:55 — forked from dshorthouse/ruby_ocr.rb
OCR Image-based PDF in ruby
require 'parallel'
require 'rtesseract'
require 'mini_magick'
source = "/MyDirectory/my.pdf"
doc = {}
pdf = MiniMagick::Image.open(source)
Parallel.map(pdf.pages.each_with_index, in_threads: 8) do |page, idx|
tmpfile = Tempfile.new(['', '.tif'])
MiniMagick::Tool::Convert.new do |convert|