Skip to content

Instantly share code, notes, and snippets.

@aashish
Last active June 6, 2018 08:50
Show Gist options
  • Save aashish/b5ae0e0be773b0ccd1f6c1a564591f87 to your computer and use it in GitHub Desktop.
Save aashish/b5ae0e0be773b0ccd1f6c1a564591f87 to your computer and use it in GitHub Desktop.
Process char in pdf
require 'hexapdf'
class ShowTextProcessor < HexaPDF::Content::Processor
def initialize(page)
super()
@canvas = page.canvas(type: :overlay)
end
def show_text(str)
boxes = decode_text_with_positioning(str)
return if boxes.string.empty?
@canvas.line_width = 1
@canvas.stroke_color(224, 0, 0)
# Polyline for transformed characters
#boxes.each {|box| @canvas.polyline(*box.points).close_subpath.stroke}
# Using rectangles is faster but not 100% correct
boxes.each do |box|
x, y = *box.lower_left
tx, ty = *box.upper_right
@canvas.rectangle(x, y, tx - x, ty - y).stroke
end
@canvas.line_width = 0.5
@canvas.stroke_color(0, 224, 0)
@canvas.polyline(*boxes.lower_left, *boxes.lower_right,
*boxes.upper_right, *boxes.upper_left).close_subpath.stroke
end
alias :show_text_with_positioning :show_text
end
doc = HexaPDF::Document.open(ARGV.shift)
doc.pages.each_with_index do |page, index|
puts "Processing page #{index + 1}"
processor = ShowTextProcessor.new(page)
processor.graphics_state.font = doc.add(Type: :Font, Subtype: :Type1, Encoding: :WinAnsiEncoding, BaseFont: :"Times-Roman")
processor.graphics_state.tm = HexaPDF::Content::TransformationMatrix.new
page.process_contents(processor)
puts processor.show_text('name')
end
doc.write("#{File.expand_path(File.dirname(__FILE__))}/show_char_boxes.pdf", optimize: true)
@aashish
Copy link
Author

aashish commented Jun 6, 2018

Hi,

Getting following error

$ ruby Downloads/show_char_boxes.rb Downloads/OoPdfFormExample.pdf 
Processing page 1
/home/adt/.rvm/gems/ruby-2.4.1/gems/hexapdf-0.6.0/lib/hexapdf/content/transformation_matrix.rb:144:in `premultiply': wrong number of arguments (given 0, expected 6) (ArgumentError)
	from /home/adt/.rvm/gems/ruby-2.4.1/gems/hexapdf-0.6.0/lib/hexapdf/content/processor.rb:439:in `block (2 levels) in decode_horizontal_text'
	from /home/adt/.rvm/gems/ruby-2.4.1/gems/hexapdf-0.6.0/lib/hexapdf/content/processor.rb:435:in `each'
	from /home/adt/.rvm/gems/ruby-2.4.1/gems/hexapdf-0.6.0/lib/hexapdf/content/processor.rb:435:in `block in decode_horizontal_text'
	from /home/adt/.rvm/gems/ruby-2.4.1/gems/hexapdf-0.6.0/lib/hexapdf/content/processor.rb:431:in `each'
	from /home/adt/.rvm/gems/ruby-2.4.1/gems/hexapdf-0.6.0/lib/hexapdf/content/processor.rb:431:in `decode_horizontal_text'
	from /home/adt/.rvm/gems/ruby-2.4.1/gems/hexapdf-0.6.0/lib/hexapdf/content/processor.rb:407:in `decode_text_with_positioning'
	from Downloads/show_char_boxes.rb:11:in `show_text'
	from Downloads/show_char_boxes.rb:42:in `block in <main>'
	from /home/adt/.rvm/gems/ruby-2.4.1/gems/hexapdf-0.6.0/lib/hexapdf/type/page_tree_node.rb:206:in `block in each_page'

Thanks,
Aashish

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment